# **Thomas Wies (Ed.)**

# **Programming Languages and Systems**

**32nd European Symposium on Programming, ESOP 2023 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2023 Paris, France, April 22–27, 2023 Proceedings**

# Lecture Notes in Computer Science 13990

Founding Editors

Gerhard Goos, Germany Juris Hartmanis, USA

# Editorial Board Members

Elisa Bertino, USA Wen Gao, China

Bernhard Steffen , Germany Moti Yung , USA

# Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at https://link.springer.com/bookseries/558

Thomas Wies Editor

# Programming Languages and Systems

32nd European Symposium on Programming, ESOP 2023 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2023 Paris, France, April 22–27, 2023 Proceedings

Editor Thomas Wies New York University New York, NY, USA

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-031-30043-1 ISBN 978-3-031-30044-8 (eBook) https://doi.org/10.1007/978-3-031-30044-8

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# ETAPS Foreword

Welcome to the 26th ETAPS! ETAPS 2023 took place in Paris, the beautiful capital of France. ETAPS 2023 was the 26th instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming languages, analysis tools, and formal approaches to software engineering. Organising these conferences in a coherent, highly synchronized conference programme enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe.

ETAPS 2023 received 361 submissions in total, 124 of which were accepted, yielding an overall acceptance rate of 34.3%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2023 featured the unifying invited speakers Véronique Cortier (CNRS, LORIA laboratory, France) and Thomas A. Henzinger (Institute of Science and Technology, Austria) and the conference-specific invited speakers Mooly Sagiv (Tel Aviv University, Israel) for ESOP and Sven Apel (Saarland University, Germany) for FASE. Invited tutorials were provided by Ana-Lucia Varbanescu (University of Twente and University of Amsterdam, The Netherlands) on heterogeneous computing and Joost-Pieter Katoen (RWTH Aachen, Germany and University of Twente, The Netherlands) on probabilistic programming.

As part of the programme we had the second edition of TOOLympics, an event to celebrate the achievements of the various competitions or comparative evaluations in the field of ETAPS.

ETAPS 2023 was organized jointly by Sorbonne Université and Université Sorbonne Paris Nord. Sorbonne Université (SU) is a multidisciplinary, research-intensive and worldclass academic institution. It was created in 2018 as the merge of two first-class research-intensive universities, UPMC (Université Pierre and Marie Curie) and Paris-Sorbonne. SU has three faculties: humanities, medicine, and 55,600 students (4,700 PhD students; 10,200 international students), 6,400 teachers, professor-researchers and 3,600 administrative and technical staff members. Université Sorbonne Paris Nord is one of the thirteen universities that succeeded the University of Paris in 1968. It is a major teaching and research center located in the north of Paris. It has five campuses, spread over the two departments of Seine-Saint-Denis and Val d'Oise: Villetaneuse, Bobigny, Saint-Denis, the Plaine Saint-Denis and Argenteuil. The university has more than 25,000 students in different fields, such as health, medicine, languages, humanities, and science. The local organization team consisted of Fabrice Kordon (general co-chair), Laure Petrucci (general co-chair), Benedikt Bollig (workshops), Stefan Haar (workshops), Étienne André (proceedings and tutorials), Céline Ghibaudo (sponsoring), Denis Poitrenaud (web), Stefan Schwoon (web), Benoît Barbot (publicity), Nathalie Sznajder (publicity), Anne-Marie Reytier (communication), Hélène Pétridis (finance) and Véronique Criart (finance).

ETAPS 2023 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), EASST (European Association of Software Science and Technology), Lip6 (Laboratoire d'Informatique de Paris 6), LIPN (Laboratoire d'informatique de Paris Nord), Sorbonne Université, Université Sorbonne Paris Nord, CNRS (Centre national de la recherche scientifique), CEA (Commissariat à l'énergie atomique et aux énergies alternatives), LMF (Laboratoire méthodes formelles), and Inria (Institut national de recherche en informatique et en automatique).

The ETAPS Steering Committee consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofroň (Prague), Barbara König (Duisburg), Thomas Noll (Aachen), Caterina Urban (Inria), Jan Křetínský (Munich), and Lenore Zuck (Chicago).

Other members of the steering committee are: Dirk Beyer (Munich), Luís Caires (Lisboa), Ana Cavalcanti (York), Bernd Finkbeiner (Saarland), Reiko Heckel (Leicester), Joost-Pieter Katoen (Aachen and Twente), Naoki Kobayashi (Tokyo), Fabrice Kordon (Paris), Laura Kovács (Vienna), Orna Kupferman (Jerusalem), Leen Lambers (Cottbus), Tiziana Margaria (Limerick), Andrzej Murawski (Oxford), Laure Petrucci (Paris), Elizabeth Polgreen (Edinburgh), Peter Ryan (Luxembourg), Sriram Sankaranarayanan (Boulder), Don Sannella (Edinburgh), Natasha Sharygina (Lugano), Pawel Sobocinski (Tallinn), Sebastián Uchitel (London and Buenos Aires), Andrzej Wasowski (Copenhagen), Stephanie Weirich (Pennsylvania), Thomas Wies (New York), Anton Wijs (Eindhoven), and James Worrell (Oxford).

I would like to take this opportunity to thank all authors, keynote speakers, attendees, organizers of the satellite workshops, and Springer-Verlag GmbH for their support. I hope you all enjoyed ETAPS 2023.

Finally, a big thanks to Laure and Fabrice and their local organization team for all their enormous efforts to make ETAPS a fantastic event.

April 2023 Marieke Huisman ETAPS SC Chair ETAPS e.V. President

# Preface

This volume contains the papers accepted at the 32nd European Symposium on Programming (ESOP 2023), held during April 22–27, 2023, in Paris, France. ESOP is one of the European Joint Conferences on Theory and Practice of Software (ETAPS); it is dedicated to fundamental issues in the specification, design, analysis, and implementation of programming languages and systems.

The 20 papers in this volume were selected from 55 submissions based on their originality and quality. One submission was desk rejected due to formatting issues. Each of the remaining submissions received at least three reviews. Authors were given the opportunity to respond to the initial reviews of their papers during the rebuttal period, December 6–8, 2022. Afterwards, the papers were discussed by the 30 Program Committee (PC) members and the 37 external reviewers. ESOP 2023 followed a double-blind review process. Roland Meyer kindly handled the two papers for which the PC Chair had conflicts of interest.

ESOP 2023 continued the artifact evaluation process established by ESOP 2022. For this edition, the evaluation was conducted by a joint Artifact Evaluation Committee (AEC) with FoSSaCS 2023. Authors of accepted papers were invited to submit artifacts, such as code, datasets, and mechanized proofs that supported the conclusions of their papers. The AEC members read the papers and explored the artifacts, assessing their quality and checking that they supported the authors' claims. The authors of seven of the accepted papers submitted artifacts, which were evaluated by 21 AEC members, with each artifact receiving at least three reviews. Authors of papers with accepted artifacts were assigned official EAPLS artifact evaluation badges, indicating that they have taken the extra time and have undergone the extra scrutiny to prepare a useful artifact. The ESOP 2023 AEC awarded Artifact Functional, Artifact (Functional and) Reusable, and Artifact Available badges. All submitted artifacts were deemed Functional and Available, and all but two were also found to be Reusable.

I sincerely thank everyone who contributed to the success of the conference. Foremost, my deep gratitude goes to the authors who submitted their works for review, providing the basis for an exciting conference program. I would like to thank the members of the ESOP 2023 Program Committee for their detailed and constructive reviews, and for their active participation in the online discussions. The external reviewers provided additional expertise that was often crucial to arrive at an informed decision. For this, they have my deepest gratitude. I also thank Niccolò Veltri and Sebastian Wolff for serving as co-chairs of the joint ESOP/FoSSaCS 2023 Artifact Evaluation Committee. It was an honor to work with all of you! Finally, I would like to thank all who contributed to the organization of ESOP 2023: the ESOP steering viii Preface

committee and its chairs Luis Caires and Peter Thiemann, as well as the ETAPS steering committee and its chair Marieke Huisman, who often provided helpful guidance and feedback.

April 2023 Thomas Wies

# Organization

# Program Committee


# Additional Reviewers

Abuah, Chike Aman, Bogdan Anastasiadi, Elli Barrière, Aurèle Bovel, Matthieu Chassot, Samuel Denis, Xavier DeYoung, Henry Di Giorgio, Alessandro Eilers, Marco Frumin, Daniil Genaim, Samir Goel, Aman Goldstein, Mark Gordillo, Pablo Greenman, Ben Grosen, Jessie Ho, Son Isabel, Miguel

Jacobs, Jules Jothimurugan, Kishor Khyzha, Artem Kuperberg, Denis Lam, Kait Li, Yao Liquori, Luigi Middelkoop, Adriaan Miné, Antoine Padovani, Luca Pham, Long Rodríguez Carbonell, Enric Rémy, Didier Saville, Philip Stanford, Caleb Stein, Dario Veltri, Niccolò Wang, Di

# Contents



# Logics for Extensional, Locally Complete Analysis via Domain Refinements ?

Flavio Ascari() , Roberto Bruni , and Roberta Gori

Dipartimento di Informatica, Universit`a di Pisa, Largo B. Pontecorvo 3, Pisa, Italy, flavio.ascari@phd.unipi.it, {roberto.bruni,roberta.gori}@unipi.it

Abstract. Abstract interpretation is a framework to design sound static analyses by over-approximating the set of program behaviours. While over-approximations can prove correctness, they cannot witness incorrectness because false alarms may arise. An ideal, but uncommon, situation is completeness of the abstraction that can ensure no false alarm is introduced by the abstract interpreter. Local Completeness Logic is a proof system that can decide both correctness and incorrectness of a program: any provable triple `<sup>A</sup> [P] c [Q] in the logic implies completeness of an intensional abstraction of program c on input P and is such that Q can be used to decide (in)correctness. However, completeness itself is an extensional property of the function computed by the program, while the above intensional analysis depends on the way the program is written and therefore not all valid triples can be derived in the proof system. Our main contribution is the study of new inference rules which allow one to perform part of the intensional analysis in a more precise abstract domain, and then to transfer the result back to the coarser domain. With these new rules, all (extensionally) valid triples can be derived in the proof system, thus untying the set of provable properties from the way the program is written.

Keywords: Abstract interpretation, Completeness in abstract interpretation, Hoare logic, Abstract domain refinement, Extensionality

# 1 Introduction

Static program analysis has been widely used to help developers produce valid software. Among static analysis techniques, abstract interpretation [6,7] is a general formalism to define sound-by-construction over-approximations that has been successfully applied in many fields, such as model checking, security and optimization [8]. Static analyses are often defined as over-approximations, that is the analysis computes a superset of the behaviors. This leads to no false negatives, that is all issues of the software are identified by the analysis, but it can cause false alarms: an incorrect behavior may be an artifact of the analysis, added by the over-approximation. While the absence of false negatives allowed a wide applicability of abstract interpretation techniques, it also make tools less

<sup>?</sup> Research supported by MIUR PRIN Project 201784YSZ5 ASPRA–Analysis of Program Analyses.

reliable to identify bugs. In fact, in many industrial applications any false alarm reported by the analysis to the developers diminishes its credibility, making it less effective in practice. This argument has recently led to the development of a logic of under-approximations, called incorrectness logic [16,17].

The Problem. In abstract interpretation, an ideal situation is completeness. Given an expressible specification, that is, one represented exactly in the abstract domain, a complete abstraction reports no false alarms. In its most widespread formulation [7], completeness is a global property: a program c is complete in the abstraction A if a condition holds for all possible inputs. Let C be the concrete domain and <sup>J</sup>c<sup>K</sup> : <sup>C</sup> <sup>→</sup> <sup>C</sup> be the (collecting) denotational semantics of <sup>c</sup>. Given an abstract domain A, a concretization function γ : A → C and an abstraction function <sup>α</sup> : <sup>C</sup> <sup>→</sup> <sup>A</sup>, an abstract interpreter <sup>J</sup>c<sup>K</sup> ] <sup>A</sup> : A → A is complete in <sup>A</sup> if for all possible inputs <sup>P</sup> we have <sup>J</sup>c<sup>K</sup> ] <sup>A</sup>α(P) = <sup>α</sup>(JcKP). Unfortunately, because of universal quantification over the possible inputs, this condition is difficult to meet in practice. Moreover, in most cases completeness is checked on an intensional abstraction of <sup>J</sup>c<sup>K</sup> computed inductively on the syntax, through inductive reasoning by an abstract interpreter <sup>J</sup>c<sup>K</sup> ] <sup>A</sup> making completeness an intensional property dependent on the program syntax [10]. However, in principle completeness is an extensional property, that only depends on the best correct abstraction <sup>J</sup>c<sup>K</sup> <sup>A</sup> of <sup>J</sup>c<sup>K</sup> in <sup>A</sup>, defined by <sup>J</sup>c<sup>K</sup> <sup>A</sup> , <sup>α</sup>JcKγ. We sum up what we may call intensional (on the left) and extensional (on the right) completeness in the following equations:

$$\begin{bmatrix} \mathbf{c} \end{bmatrix}\_A^\sharp \alpha = \alpha \begin{bmatrix} \mathbf{c} \end{bmatrix} \tag{1} \tag{1}$$

$$\begin{bmatrix} \mathbf{c} \end{bmatrix}^\sharp \alpha = \alpha \begin{bmatrix} \mathbf{c} \end{bmatrix}^A \alpha = \alpha \begin{bmatrix} \mathbf{c} \end{bmatrix} \gamma \alpha = \alpha \begin{bmatrix} \mathbf{c} \end{bmatrix} \tag{1}$$

We show the difference between <sup>J</sup>c<sup>K</sup> <sup>A</sup> and <sup>J</sup>c<sup>K</sup> ] <sup>A</sup> in the following example.

Example 1 (Extensional and intensional properties). Consider the concrete domain of sets of integers and the abstract domain of signs:

The meaning of the abstract elements of Sign is to represent concrete values that satisfy the respective property. So for instance, denoting with the function γ the "meaning" of an abstract element, we have γ(Z<0) = {n ∈ Z | n < 0}. Conversely, α "abstracts" a concrete set of values to the least abstract property describing it, for instance α({0; 1; 100}) = Z<sup>≥</sup>0.

Consider the simple program fragment c , x := x + 1; x := x - 1. Its denotational semantics <sup>J</sup>c<sup>K</sup> is the identity function idZ, so its best correct abstraction is the abstract identity idSign = α id<sup>Z</sup> γ. This is an extensional property of the program because it only depends on the function it computes, i.e., its denotational semantics. However, an analyzer does not know the semantics of c, so it has to analyze the program syntactically, breaking it down in elementary pieces and gluing the results together. So for instance, starting from the concrete point P = {1} the analysis first abstracts it to the property α(P) = Z>0, then it computes

$$\begin{aligned} \begin{bmatrix} \mathbf{c} \end{bmatrix}\_{\mathbf{Sign}}^{\sharp}(\mathbb{Z}\_{>0}) &= \begin{bmatrix} \mathbf{x} \ := \ \mathbf{x} \ - \ \mathbf{1} \end{bmatrix}\_{\mathbf{Sign}}^{\sharp} \begin{bmatrix} \mathbf{x} \ := \ \mathbf{x} \ + \ \mathbf{1} \end{bmatrix}\_{\mathbf{Sign}}^{\sharp}(\mathbb{Z}\_{>0}) \\ &= \begin{bmatrix} \mathbf{x} \ := \ \mathbf{x} \ - \ \mathbf{1} \end{bmatrix}\_{\mathbf{Sign}}^{\sharp}(\mathbb{Z}\_{>0}) = \mathbb{Z}\_{\geq 0} . \end{aligned}$$

Analogous calculations for all properties in Sign yields the abstraction

$$\begin{aligned} \left\lVert \mathbf{c} \right\rVert\_{\operatorname{Sign}}^{\sharp}(a) = \begin{cases} \bot & \text{if } a = \bot \\ \mathbb{Z}\_{\geq 0} & \text{if } a \in \left\{ \left\lvert \mathbb{Z}\_{=0} \right\rvert, \left\lvert \mathbb{Z}\_{>0} \right\rvert, \mathbb{Z}\_{\geq 0} \right\} \\ \mathbb{Z}\_{<0} & \text{if } a = \mathbb{Z}\_{<0} \\ \top & \text{if } a \in \left\{ \left\lvert \mathbb{Z}\_{\leq 0} \right\rvert, \left\lvert \mathbb{Z}\_{\neq 0} \right\rvert, \top \right\} \end{cases} \end{aligned}$$

that, albeit sound, is less precise than idSign (we highlight with a gray background all inputs on which <sup>J</sup>c<sup>K</sup> ] Sign loses accuracy). If instead the program were written as c <sup>0</sup> , skip, the analysis in Sign would yield the best correct abstraction <sup>J</sup><sup>c</sup> 0 K ] Sign = idSign. Therefore, the abstraction depends on how the program is written and not only on its semantics: it is what it is called an intensional property (see e.g. [1] for more about intensional and extensional abstract properties). ut

To overcome the former limitation of "global" completeness, the concept of local completeness [2] has been recently proposed that is related to some specific input. While this condition is much more common in practice, it is also much more complex to prove. In order to do so, the authors of [2] introduce a Local Completeness Logic parametric with respect to an abstraction A (LCL<sup>A</sup> for short), that is able to prove triples `<sup>A</sup> [P] c [Q] with the following meaning


The important consequence of the previous points is the fact that a triple in LCL<sup>A</sup> is able to prove both correctness and incorrectness of a program with respect to a specification Spec expressible in A. By point (2), if the abstract analysis reports no errors in Q then there are none because of the over-approximation. However, if the analysis does report an issue, this must be present in the abstraction of <sup>J</sup>cK<sup>P</sup> as well, that is the same as the abstraction of <sup>Q</sup>: this means that Q contains a witness of the violation of Spec, and this witness must be in <sup>J</sup>cK<sup>P</sup> because of the under-approximation ensured by point (1). While local completeness of point (3) is a key property to prove point (1-2), it would be enough to guarantee that (3) holds for the extensional best correct approximation <sup>J</sup>c<sup>K</sup> A of <sup>J</sup>c<sup>K</sup> rather than for the intensional abstract interpreter <sup>J</sup>c<sup>K</sup> ] <sup>A</sup>: this suggests that it is possible to weaken the hypothesis (3) in order to make the proof system able to derive more valid triples.

Main Contributions. Building on the proof system of LCLA, we add new rules to relax point (3) to local completeness of the extensional abstraction <sup>J</sup>c<sup>K</sup> <sup>A</sup>. This way, while the proof system itself remains intensional as it deduces program properties by working inductively on the syntax, the information it produces is more precise. Specifically, since the property associated with triples is extensional no precision is lost because of the intensional abstract interpreter, and in the end allows us to prove more triples. In order to achieve this goal, we introduce new rules to dynamically refine the abstract domain during the analysis. While in general an analysis in a more concrete domain is more precise, LCL<sup>A</sup> requires local completeness, which is not necessarily preserved by domain refinement [11]. For instance, a common way to combine two different abstract domains is their reduced product [7], but it is not always the case that the analysis in the reduced product is (locally) complete, even when it is such in the two domains.

To preserve local completeness, we introduce several rules for domain refinement in LCL<sup>A</sup> and compare their expressiveness and usability. All of them provide extensional guarantees, in the sense that point (3) is replaced with local completeness of the best correct abstraction <sup>J</sup>c<sup>K</sup> <sup>A</sup> on input P. The first one is called (refine-ext). LCL<sup>A</sup> extended with (refine-ext) turns out to be logically complete: any triple satisfying the above conditions (1–3) can be proved in our proof system. This is a theoretical improvement with respect to LCLA, that instead was intrinsically incomplete as a logic, i.e., for all abstractions A there exists a sound triple that cannot be proved. While (refine-ext) is theoretically interesting, one of its hypothesis is unfeasible to check in practice. To improve applicability, we propose two derived rules, (refine-int) and (refine-pre), whose premises can be checked effectively and imply the hypotheses of the more general (refine-ext). Surprisingly, it turns out that (refine-int) enjoys a logical completeness result too, while (refine-pre) is strictly weaker (in terms of strength of the logic, see Example 6). Despite this, the latter is much simpler and preferable to use in practice whenever possible (see Example 5), while the former can be used in more situations and is at times the best choice.

We present a pictorial comparison among the expressiveness of the various proof systems in Fig. 1. Each node represent the proof system LCL<sup>A</sup> extended with one rule (the bottom one being plain LCLA). An arrow in the picture means a more powerful proof system, i.e., a proof system that can prove more triples, with its label pointing out the result justifying the claim. The two arrows between the two topmost nodes are because the two proof systems are logically equivalent, i.e., they can prove the same triples.

Structure of the paper. In Section 2 we explain the notation used in the paper and recall the basics of abstract interpretation. In Section 3 we present LCLA, mostly summarizing the content of [2], with a focus on what is used in the following sections. In Section 4 we present and compare our new rules to refine the abstract domain, namely (refine-ext) and the two derived rules (refine-int) and (refine-pre). We conclude in Section 5. Some proofs and technical examples are in Appendix A.

Fig. 1: Relations between the new proof systems

# 2 Background

Notation. We write P(S) for the powerset of S and id<sup>S</sup> : S → S for the identity function on a set S, with subscripts omitted when obvious from the context. If f : S → T is a function, we overload the symbol f to denote also its lifting f : P(S) → P(T) defined as f(X) = {f(x)| x ∈ X} for any X ⊆ S. Given two functions f : S → T and g : T → V we denote their composition as g ◦ f or simply gf. For a function f : S → S, we denote f <sup>n</sup> : S → S the composition of f with itself n times, i.e. f <sup>0</sup> = id<sup>S</sup> and f <sup>n</sup>+1 = f ◦ f n.

In ordered structures, such as posets and lattices, with carrier set C, we denote the ordering with ≤<sup>C</sup> , least upper bounds (lubs) with t<sup>C</sup> , greatest lower bounds (glbs) with u<sup>C</sup> , least element with ⊥<sup>C</sup> and greatest element with ><sup>C</sup> . For all these, we omit the subscript when evident from the context. Any powerset is a complete lattice ordered by set inclusion. In this case, we use standard symbols ⊆, ∪, etc. Given a poset T and two functions f, g : S → T, the notation f ≤ g means that, for all s ∈ S, f(s) ≤<sup>T</sup> g(s). A function f between complete lattices is additive (resp. co-additive) if it preserves arbitrary lubs (resp. glbs).

#### 2.1 Abstract Interpretation

Abstract interpretation [6,7,5] is a general framework to define static analyses that are sound by construction. The main idea is to approximate the program semantics on some abstract domain A instead of working on the concrete domain C. The main tool used to study abstract interpretations are Galois connections. Given two complete lattices C and A, a pair of monotone functions α : C → A and γ : A → C define a Galois connection (GC) when

$$
\forall c \in C, a \in A. \quad \alpha(c) \le\_A a \iff c \le\_C \gamma(a).
$$

We call C and A the concrete and the abstract domain respectively, α the abstraction function and γ the concretization function. The functions α and γ are also called adjoints. For any GC, it holds id<sup>C</sup> ≤ γα, αγ ≤ idA, γ is co-additive and α is additive. A concrete value c ∈ C is called expressible in A if γα(c) = c. We only consider GCs in which αγ = idA, called Galois insertions (GIs). In a GI α is onto and γ is injective. A GI is said to be trivial if A is isomorphic to the concrete domain or if it is the singleton {>A}.

We overload the symbol A to denote also the function γα : C → C: this is always a closure operator, that is a monotone, increasing (i.e. c ≤ A(c) for all c) and idempotent function. In the following, we use closure operators as much as possible to simplify the notation. Particularly, they are useful to denote domain refinements, as exemplified in the next paragraph. Note that they are still very expressive because γ is injective: for instance A(c) = A(c 0 ) if and only if α(c) = α(c 0 ). Nonetheless, the use of closure operators is only a matter of notation and it is always possible to rewrite them using the adjoints.

We use Abs(C) to denote the set of abstract domains over C, and we write Aα,γ ∈ Abs(C) when we need to make the two maps α and γ explicit (we omit them when not needed). Given two abstract domains Aα,γ, A<sup>0</sup> α0 ,γ<sup>0</sup> ∈ Abs(C) over C, we say A<sup>0</sup> is a refinement of A, written A<sup>0</sup> A, when γ(A) ⊆ γ 0 (A<sup>0</sup> ). When this happens, the abstract domain A<sup>0</sup> is more expressive than A, and in particular for all concrete elements c ∈ C the inequality A<sup>0</sup> (c) ≤<sup>C</sup> A(c) holds.

Abstracting Functions. Given a monotone function f : C → C and an abstract domain Aα,γ ∈ Abs(C), a function f ] : A → A is a sound approximation (or abstraction) of f if αf ≤ f ]α. Its best correct approximation (bca) is f <sup>A</sup> = αf γ, and it is the most precise of all the sound approximations of f: a function f ] is a sound approximation of f if and only if f <sup>A</sup> ≤ f ] .

A sound abstraction f ] of f is complete if αf = f ]α. It turns out that there exists a complete abstraction f ] if and only if the bca f <sup>A</sup> is complete. If this is the case, we say that the abstract domain A is complete for f and denote it with C <sup>A</sup>(f). Intuitively, completeness means that the abstract function f ] is as precise as possible in the given abstract domain A, and in program analysis this allows to have greater confidence in the alarms raised. We remark that A is complete for f if and only if αf = f <sup>A</sup>α = αf γα. Since γ is injective, this is true if and only if γαf = γαf γα, so that we define the (global) completeness property C <sup>A</sup>(f) as follows:

$$
\mathbb{C}^A(f) \iff Af = AfA.
$$

#### 2.2 Regular Commands.

Following [2] (see also [16]) we consider a language of regular commands:

$$\mathsf{Reg} \ni \mathsf{r} ::= \mathsf{e} \mid \mathsf{r}; \mathsf{r} \mid \mathsf{r} \oplus \mathsf{r} \mid \mathsf{r}^\*$$

This is a general language and can be instantiated differently changing the set Exp of basic transfer expressions e. These determines the kind of operations allowed in the language, and in our examples we assume to have deterministic assignments and boolean guards. Using standard definitions for arithmetic and boolean expressions a ∈ AExp and b ∈ BExp, we consider

$$\mathsf{Exp} \ni \mathsf{e} \text{ ::= } \mathsf{skip} \mid \mathsf{x} \; := \mathsf{a} \mid \mathsf{b} ?$$

skip does nothing, x := a is a standard deterministic assignment. The semantics of b? is that of an "assume" statement: if its input satisfies b it does nothing, otherwise it diverges. The term r;r represent the usual sequential composition, and r⊕r is nondeterministic choice. The Kleene star r <sup>∗</sup> denote a nondeterministic iteration, where r can be executed any number of time (possibly 0) before exiting. It can be thought as the solution of the recursive equation r <sup>∗</sup> ≡ skip⊕(r;r ∗ ). We write r <sup>n</sup> to denote sequential composition of r with itself n times, analogously to how we use f <sup>n</sup> for function composition.

This formulation can accommodate for a standard imperative programming language [18] defining if and while statements as

$$\begin{aligned} \text{if } \begin{array}{l} \text{if } \begin{array}{l} \text{then } \mathsf{c}\_{1} \ \mathsf{else} \ \mathsf{c}\_{2} \ \stackrel{\scriptstyle \Delta}{=} \end{array} \end{aligned} \left( \begin{array}{l} \mathsf{b}? \ \mathsf{c}\_{1} \end{array} \right) \oplus \left( \begin{array}{l} \neg \mathsf{b} \end{array} \right) ? \end{aligned} \left( \begin{array}{l} \neg \mathsf{b} \end{array} \right) \oplus \left( \begin{array}{l} \neg \mathsf{b} \end{array} \right) \left( \begin{array}{l} \neg \mathsf{b} \end{array} \right) \end{aligned}$$

Concrete semantics. We assume the semantics <sup>L</sup>·<sup>M</sup> : Exp <sup>→</sup> <sup>C</sup> <sup>→</sup> <sup>C</sup> of basic transfer expressions on a complete lattice C to be additive. We believe this assumption not to be restrictive, and is always satisfied in collecting semantics. For our instantiation of Exp, we consider a finite set of variables Var, then the set of stores Σ = Var → Z that are (total) functions σ from Var to integers. The complete lattice C is then defined simply as P(Σ) with the usual poset structure given by set inclusion. Given a store σ ∈ Σ, store update σ[x 7→ v] is defined as usual for x ∈ Var and v ∈ Z. We consider standard, inductively defined semantics <sup>L</sup>·<sup>M</sup> for arithmetic and boolean expressions. The concrete semantics of regular commands <sup>J</sup>·<sup>K</sup> : Reg <sup>→</sup> <sup>C</sup> <sup>→</sup> <sup>C</sup> is defined inductively as in Fig. 2a, where the semantics of basic transfer expressions e ∈ Exp is defined as follows:

$$\begin{aligned} \{\mathsf{skip}\}S &\stackrel{\scriptstyle \Delta}{=} S\\ \{\mathsf{x} &:= \mathsf{a}\}S &\stackrel{\scriptstyle \Delta}{=} \{\sigma[x \mapsto \langle \mathsf{a} \rangle \sigma] \mid \sigma \in S\} \\ \{\mathsf{b}\texttt{?}\}S &\stackrel{\scriptstyle \Delta}{=} \{\sigma \in S \mid \langle \mathsf{b} \rangle \sigma = \mathsf{tt}\} \end{aligned}$$

Abstract Semantics. The (compositional) abstract semantics of regular commands <sup>J</sup>·<sup>K</sup> ] <sup>A</sup> : Reg → A → A on an abstract domain A ∈ Abs(C) is defined inductively as in Fig. 2b. As common for abstract interpreters, we assume the analyser knows the best correct abstraction of expression and thus is able to compute <sup>J</sup>e<sup>K</sup> <sup>A</sup>. A straightforward proof by structural induction shows that the abstract semantics is sound w.r.t. <sup>J</sup>r<sup>K</sup> (i.e., <sup>α</sup>Jr<sup>K</sup> <sup>≤</sup> <sup>J</sup>r<sup>K</sup> ] Aα) and monotone. However, in general it is less precise than the bca, i.e., <sup>J</sup>r<sup>K</sup> ] <sup>A</sup> <sup>6</sup><sup>=</sup> <sup>J</sup>r<sup>K</sup> <sup>A</sup> <sup>=</sup> <sup>α</sup>JrKγ.

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \left[\mathbf{e}\right]c \triangleq \; \left[\mathbf{e}\right]c \end{array} \\ \left[\mathbf{r}\_{1},\mathbf{r}\_{2}\right]c \triangleq \; \left[\mathbf{r}\_{2}\right] \left[\mathbf{r}\_{1}\right] \left(c\right) \end{array} \qquad \qquad \qquad \qquad \qquad \left[\mathbf{r}\_{1},\mathbf{r}\_{2}\right]\_{A}^{\sharp}a \triangleq \left[\mathbf{r}\_{2}\right]\_{A}^{\sharp}\left[\mathbf{r}\_{1}\right]\_{A}^{\sharp}a \end{array} \\ \begin{array}{c} \left[\mathbf{r}\_{1}\oplus\mathbf{r}\_{2}\right]c \triangleq \; \left[\mathbf{r}\_{1}\right]c \sqcup \; \left[\mathbf{r}\_{2}\right]\_{A}^{\sharp}a \triangleq \; \left[\mathbf{r}\_{2}\right]\_{A}^{\sharp}a \triangleq \left[\mathbf{r}\_{2}\right]\_{A}^{\sharp}a \downarrow & \left[\mathbf{r}\_{2}\right]\_{A}^{\sharp}a \downarrow \\\\ \left[\mathbf{r}^{\star}\right]c \triangleq \; \left[\mathbf{r}\right]^{\alpha}c & \left[\mathbf{r}^{\star}\right]\_{A}^{\sharp}a \triangleq \; \left[\mathbf{r}^{\star}\right]\_{A}^{\sharp}a \triangleq \; \left[\mathbf{r}\_{1}\right]\_{A}^{\sharp}a \triangleq \; \left[\mathbf{r}\_{2}\right]\_{A}^{\sharp}a \\\\ \text{(a) Conrete semantics} \end{array} \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \left(\mathbf{b}\right) \; \text{Abstract semantics} \end{array}$$

Fig. 2: Concrete and abstract semantics of regular commands, side by side

Shorthands. Throughout the paper, we present some simple examples of program analysis. The programs discussed in the examples contain just one or two variables (usually x and y), so we denote their sets of stores just as Σ = Z or Σ = Z 2 . In these cases, the convention is that an element of Z is the value of the single variable in Var, and a pair (n, m) ∈ Z <sup>2</sup> denote the store σ(x) = n, σ(y) = m. We also lift these conventions to sets of values in Z or Z 2 . At times, to improve readability, we use logical formulas such as (y ∈ {1; 2; 99} ∧ x = y) possibly using intervals, like in x ∈ [0; 5], to describe set of stores.

# 3 Local Completeness Logic

In this section we present the notion of local completeness and introduce the proof system LCL<sup>A</sup> (Local Completeness Logic on A) as was defined in [2].

For a generic program and abstract domain, global completeness is a too strong requirement: for conditionals to be complete the abstract domain should basically contain a complete sublattice of the concrete domain. For this reason, the weaker notion of local completeness can be more convenient in many cases.

Definition 1 (Local completeness, cf. [2]). Let f : C → C be a concrete function, c ∈ C a concrete point and A ∈ Abs(C) and abstract domain for C. Then A is locally complete for f on c, written C A c (f), iff

$$Af(c) = AfA(c).$$

A remarkable difference between global and local completeness is that, while the former can be proved compositionally irrespective of the input [10], the latter needs it. Consequently, to carry on a compositional proof of local completeness, information on the input to each subpart of the program is also required, i.e., all traversed states are important. However, local completeness enjoys an "abstract convexity" property, that is, local completeness on a concrete point c implies local completeness on any concrete point d between c and its abstraction A(c). This observation has been exploited in the design of the proof system LCLA. The system is able to prove triples `<sup>A</sup> [P] r [Q] ensuring that:


Fig. 3: The proof system LCLA.


The second point means that, given a specification Spec expressible in A, any provable triple `<sup>A</sup> [P] r [Q] either proves correctness of r with respect to Spec or expose some alerts in Q \ Spec. These in turns correspond to true ones because of the first point, as spelled out by Corollary 1 below.

The proof system is defined in Fig. 3. The crux of the proof system is to constrain the under-approximation Q to have the same abstraction of the concrete semantics <sup>J</sup>rKP, as for instance explicitly required in rule (relax). This, by the abstract convexity property mentioned above, means that local completeness of <sup>J</sup>r<sup>K</sup> on the under-approximation <sup>P</sup> of the concrete store is enough to prove local completeness.

The three key properties (1–3) listed above are formalized by the following (intensional) soundness result:

#### Theorem 1 (Soundness, cf. [2]). Let Aα,γ ∈ Abs(C). If `<sup>A</sup> [P] r [Q] then:

1. <sup>Q</sup> <sup>≤</sup> <sup>J</sup>rKP, 2. <sup>α</sup>(JrKP) = <sup>α</sup>(Q), 3. <sup>J</sup>r<sup>K</sup> ] <sup>A</sup>α(P) = α(Q).

As a consequence of this theorem, given a specification expressible in the abstract domain A, a provable triple `<sup>A</sup> [P] r [Q] can determine both correctness and incorrectness of the program r:

Corollary 1 (Proofs of Verification, cf. [2]). Let Aα,γ ∈ Abs(C) and a ∈ A. If `<sup>A</sup> [P] r [Q] then

$$[\mathfrak{r}]P \le \gamma(a) \iff Q \le \gamma(a).$$

The corollary is useful in program analysis and verification because, given a specification a expressible in A and a provable triple `<sup>A</sup> [P] r [Q], it allows to distinguish two cases.

– If <sup>Q</sup> <sup>⊆</sup> <sup>γ</sup>(a), then we have also <sup>J</sup>rK<sup>P</sup> <sup>⊆</sup> <sup>γ</sup>(a), so that the program is correct with respect to the specification.

– If <sup>Q</sup> \* <sup>γ</sup>(a), then also <sup>J</sup>rK<sup>P</sup> \* <sup>γ</sup>(a), that means <sup>J</sup>rK<sup>P</sup> \ <sup>γ</sup>(a) is not empty and thus contains a true alert of the program. Moreover, since <sup>Q</sup> <sup>⊆</sup> <sup>J</sup>rK<sup>P</sup> we have that <sup>Q</sup> \ <sup>γ</sup>(a) <sup>⊆</sup> <sup>J</sup>rK<sup>P</sup> \ <sup>γ</sup>(a), so that already <sup>Q</sup> pinpoints some issues.

To better show how this work, we briefly introduce the following example (discussed also in [2] where it is possible to find all details of the derivation).

Example 2. Consider the concrete domain C = P(Z), the abstract domain Int of intervals, the precondition P = {1; 999} and the command r , (r<sup>1</sup> ⊕ r2) ∗ , where

$$\mathbf{r}\_1 \stackrel{\triangle}{=} \begin{pmatrix} \mathbf{x} \ \mathbf{0} \end{pmatrix} \mathbf{?}; \ \mathbf{x} \ := \ \mathbf{x} \ - \ \mathbf{1}$$
 
$$\mathbf{r}\_2 \stackrel{\triangle}{=} \begin{pmatrix} \mathbf{x} \ \mathbf{<1000} \end{pmatrix} \mathbf{?}; \ \mathbf{x} \ := \ \mathbf{x} + \ \mathbf{1}$$

In LCL<sup>A</sup> it is possible to prove the triple `Int [P] r [Q], whose postcondition is Q = {0; 2; 1000}. Consider the two specification Spec = (x ≤ 1000) and Spec<sup>0</sup> = (x ≥ 100). The triple is then able to prove correctness of Spec and incorrectness of Spec<sup>0</sup> . For the former, observe that Q ⊆ Spec. By Corollary 1 we then know <sup>J</sup>rK<sup>P</sup> <sup>⊆</sup> Spec, that is correctness. For the latter, <sup>Q</sup> exhibits two witnesses to the violation of Spec<sup>0</sup> , that are 0, 2 ∈ Q \ Spec<sup>0</sup> . By point (1) of soundness we then know that 0, <sup>2</sup> <sup>∈</sup> <sup>Q</sup> <sup>⊆</sup> <sup>J</sup>rK<sup>P</sup> are true alerts. ut

Strictly speaking, the proof of Corollary 1 only relies on points (1-2) of Theorem 1. Point (3) is in turn needed to ensure the first two, but extensional completeness would suffice to this aim. This means that we can weaken the soundness theorem (logically speaking, that is we prove a stronger conclusion, so the theorem as an implication is weaker) while still preserving the validity of Corollary 1. To this end, we propose a new soundness result involving extensional completeness: the important difference is that in point (3) we use the best correct abstraction <sup>J</sup>r<sup>K</sup> <sup>A</sup> in place of the inductively defined <sup>J</sup>r<sup>K</sup> ] <sup>A</sup>. Since Theorem 1 involves <sup>J</sup>r<sup>K</sup> ] <sup>A</sup>, an intensional property of the program r that depends on how the program is written (see Example 1 or Example 1 in Section 5 of [13]), while the new statement we propose relies only on <sup>J</sup>r<sup>K</sup> <sup>A</sup>, an extensional property of the computed function <sup>J</sup>r<sup>K</sup> and not of <sup>r</sup> itself, for the rest of the paper we use the name intensional soundness for Theorem 1, and extensional soundness for the following Theorem 2.

Theorem 2 (Extensional soundness). Let Aα,γ ∈ Abs(C). If `<sup>A</sup> [P] r [Q] then:

1. <sup>Q</sup> <sup>≤</sup> <sup>J</sup>rKP,

2. <sup>α</sup>(JrKP) = <sup>α</sup>(Q), 3. <sup>J</sup>r<sup>K</sup> <sup>A</sup>α(P) = α(Q).

Lastly, we remark that the original LCL<sup>A</sup> is intrinsically logically incomplete ([2], cf. Theorem 5.12): for every non trivial abstraction A there exists a triple that is intensionally sound (satisfies points (1-3) of Theorem 1) but cannot be proved in LCLA. We will discuss logical (in)completeness for our extensional framework in Section 4.1.

$$\frac{\vdash\_{A'} \begin{bmatrix} P \end{bmatrix} \mathfrak{r} \begin{bmatrix} Q \end{bmatrix} \quad A' \preceq A \quad A \begin{bmatrix} \mathfrak{r} \end{bmatrix}^{A'} A(P) = A(Q)}{\vdash\_A \begin{bmatrix} P \end{bmatrix} \mathfrak{r} \begin{bmatrix} Q \end{bmatrix}} \text{ (refine-ext)} $$

Fig. 4: Rule refine for LCLA.

# 4 Refining Abstract Domain

LCL<sup>A</sup> can prove a triple [P] <sup>r</sup> [Q] for some <sup>Q</sup> only when <sup>J</sup>r<sup>K</sup> ] <sup>A</sup> is locally complete, that is <sup>J</sup>r<sup>K</sup> ] <sup>A</sup>α(P) = <sup>α</sup>(JrKP) (see Theorem 1). Since <sup>J</sup>r<sup>K</sup> ] <sup>A</sup> is computed in a compositional way, the above condition strictly depends on how r is written: to prove the local completeness of <sup>J</sup>r<sup>K</sup> ] <sup>A</sup>, we need to prove that all its syntactic components are locally complete, that is an intensional property. However, the goal of the analysis is to study the behaviour of the function <sup>J</sup>rK, not how it is encoded by r. Hence, our aim is to enhance the original proof system in order to be able to handle triples where the extensional abstraction <sup>J</sup>r<sup>K</sup> <sup>A</sup> is proved to be locally complete w.r.t. the given input, that is <sup>J</sup>r<sup>K</sup> <sup>A</sup>α(P) = <sup>α</sup>(JrKP). To this end, we extend the proof system with a new inference rule, that is shown in Fig. 4. It is named after "refine" because it allows to refine abstract domains A to some <sup>A</sup><sup>0</sup> <sup>A</sup> and "ext" since it involves the extensional bca <sup>J</sup>r<sup>K</sup> A 0 of <sup>J</sup>r<sup>K</sup> in <sup>A</sup><sup>0</sup> (to distinguish it from the rules we will introduce in Section 4.2).

Using (refine-ext) it is possible to construct a derivation that proves local completeness of portions of the whole program in a more precise abstract domain A<sup>0</sup> and then carries the result over to the global analysis in a coarser domain A. The only requirement for the application of the rule is that domain A<sup>0</sup> is chosen in such a way that <sup>A</sup>Jr<sup>K</sup> A 0 A(P) = A(Q) is satisfied.

Formally, given the two abstract domains Aα,γ, A<sup>0</sup> α0 ,γ<sup>0</sup> ∈ Abs(C), this last premise of rule (refine-ext) should be written as αγ<sup>0</sup> JrK A 0 α <sup>0</sup>A(P) = α(Q) to match function domains and codomains. However we prefer the more concise, albeit a little imprecise, notation used in Fig. 4. That writing is justified by the following intuitive argument: since A<sup>0</sup> A we can consider with a slight abuse of notation (seeing abstract domains as closures) A ⊆ A<sup>0</sup> ⊆ C, so that for any element a ∈ A ⊆ C we have γ(a) = γ 0 (a) = a and for any c ∈ C we have α <sup>0</sup>A(c) = A(c). With these, it follows that

$$
\alpha \gamma' \|\mathbf{r}\|^{A'} \alpha' A(P) = \alpha \|\mathbf{r}\|^{A'} A(P) = A \|\mathbf{r}\|^{A'} A(P).
$$

With rule (refine-ext) we cannot prove intensional soundness (Theorem 1): since this rule allows to perform part of the analysis in a more concrete domain A0 , we do not get any information on <sup>J</sup>r<sup>K</sup> ] <sup>A</sup>. However, we can prove extensional soundness (Theorem 2) and get all the benefits of Corollary 1.

Theorem 3 (Extensional soundness of (refine-ext)). The proof system in Fig. 3 with the addition of rule (refine-ext) (see Fig. 4) is extensionally sound (cf. Theorem 2).

We also remark that a rule like (refine-ext), that allows to carry on part of the proof in a different abstract domain, cannot come unconstrained. We present an example showing that a similar inference rule only requiring the triple [P] r [Q] to be provable in an abstract domain A<sup>0</sup> A without any other constraint would be unsound.

Example 3. Consider the concrete domain C = P(Z) of integers, the point P = {−5; −1}, the abstract domain Sign of Example 1 and the program

$$\mathbf{r} \triangleq \mathbf{x} \quad \text{\(\ast\)}\\\mathbf{r} \triangleq \mathbf{r} \quad \text{\(\ast\)}$$

Then C Sign and we can prove `<sup>C</sup> [P] r [{5; 9}] applying (transfer) since all assignments are locally complete in the concrete domain. However, if <sup>f</sup> <sup>=</sup> <sup>J</sup>r<sup>K</sup> <sup>=</sup> <sup>L</sup>x := x + 10M, it is not the case that <sup>C</sup> Sign P (f): indeed

$$\text{Sign}(f(\text{Sign}(P))) = \text{Sign}(f(\mathbb{Z}\_{<0})) = \text{Sign}(\{n \in \mathbb{Z} \, | \, n < 10\}) = \top$$

while

$$\mathsf{Sign}(f(P)) = \mathsf{Sign}(\{5, 9\}) = \mathbb{Z}\_{\geq 0}.$$

This means that a rule without any additional condition can prove a triple which is not locally complete, hence it is unsound. ut

#### 4.1 Logical Completeness

Among all the possible conditions that can be added to a rule like (refine-ext), we believe ours to be very general since, differently than the original LCL<sup>A</sup> proof system (see Section 5.2 of [2]), the introduction of (refine-ext) allows us to derive a logical completeness result, i.e. the ability to prove any triple satisfying the soundness properties guaranteed by the proof system.

However, to prove such a result, our extension need an additional rule to handle loops, just like the original LCL<sup>A</sup> and Incorrectness Logic [16]. The necessary infinitary rule, called (limit), allows the proof system to handle Kleene star, and is the same as LCLA:

$$\frac{\forall n \in \mathbb{N}. \ \vdash\_A [P\_n] \text{ } [P\_{n+1}]}{\vdash\_A [P\_0] \text{ } \text{r}^\* \ [\bigvee\_{i \in \mathbb{N}} P\_i]} \text{ (limit)}.$$

Theorem 4 (Logical completeness of (refine-ext)). Consider the proof system of Fig. <sup>3</sup> with the addition of rules (refine-ext) and (limit). If <sup>Q</sup> <sup>≤</sup> <sup>J</sup>rK<sup>P</sup> and JrK <sup>A</sup>α(P) = α(Q) then `<sup>A</sup> [P] r [Q].

The previous theorem proves the logical completeness of our proof system with respect to the property of extensional soundness. Indeed, if <sup>Q</sup> <sup>≤</sup> <sup>J</sup>rK<sup>P</sup> and JrK <sup>A</sup>α(P) = α(Q) we also have:

$$
\alpha(Q) \le \alpha(\lbrack \mathfrak{r} \rbrack P) \le \lbrack \mathfrak{r} \rbrack^A \alpha(P) = \alpha(Q),
$$

hence all three conditions of Theorem 2 are satisfied.

An interesting consequence of this result is the existence of a refinement A<sup>0</sup> in which it is possible to carry out the proof. In principle such a refinement could be the concrete domain C (as shown in the proof in Appendix A), that is not computable. However, it is worth nothing that for a sequential fragment (a portion of code without loops) the concrete domain can be actually used (for instance via first-order logic). This opens up the possibility, for instance, to infer a loop invariant on the body using C, and then prove it using an abstract domain. In Section 4.3 we discuss this issue further.

#### 4.2 Derived Refinement Rules

The hypothesis <sup>A</sup>Jr<sup>K</sup> A 0 A(P) = A(Q) is added to rule (refine-ext) in order to guarantee soundness: in general, the ability to prove a triple such as [P] r [Q] in a refined domain <sup>A</sup><sup>0</sup> only gives information on <sup>A</sup>Jr<sup>K</sup> A 0 A0 (P) but not on <sup>A</sup>Jr<sup>K</sup> A 0 A(P). In fact, the Example <sup>4</sup> shows that <sup>A</sup>Jr<sup>K</sup> A 0 A0 (P) and <sup>A</sup>Jr<sup>K</sup> A 0 A(P) can be different.

Example 4. Consider the concrete domain P(Z), the abstract domain of signs Signα,γ ∈ Abs(P(Z)) (introduced in Example 1) and its refinement Sign<sup>1</sup> below

For the command r , x := x - 1 and the concrete point P = {1} we have

$$\mathsf{Sign}[\mathfrak{r}]^{\mathsf{Sign}\_1} \mathsf{Sign}\_1(P) = \mathsf{Sign}[\mathfrak{r}]^{\mathsf{Sign}\_1}(\mathbb{Z}\_{=1}) = \mathbb{Z}\_{=0}$$

but

$$\mathsf{Sign}[\mathfrak{r}]^{\mathsf{Sign}\_1} \mathsf{Sign}(P) = \mathsf{Sign}[\mathfrak{r}]^{\mathsf{Sign}\_1} (\mathbb{Z}\_{\geq 0}) = \mathbb{Z}\_{\geq 0}. \tag{7}$$

Despite being necessary, the hypothesis of rule (refine-ext) cannot be checked in practice because the bca <sup>J</sup>r<sup>K</sup> A 0 of a composite command r is not known by the analyser. To mitigate this issue, we present two derived rules whose premises imply the premises of Rule (refine-ext), hence ensuring extensional soundness by means of Theorem 3.

The first rule we present replaces the requirement on the extensional bca <sup>J</sup>r<sup>K</sup> A 0 with requirements on the intensional compositional abstraction <sup>J</sup>r<sup>K</sup> ] <sup>A</sup><sup>0</sup> computed in A<sup>0</sup> . For this reason, we call this rule (refine-int).

Proposition 1. The following rule (refine-int) is extensionally sound:

$$\frac{\vdash\_{A'} \begin{bmatrix} P \end{bmatrix} \mathfrak{r} \begin{bmatrix} Q \end{bmatrix} \quad A' \preceq A \quad A \begin{bmatrix} \mathfrak{r} \end{bmatrix}\_{A'}^{\sharp} A(P) = A(Q)}{\vdash\_{A} \begin{bmatrix} P \end{bmatrix} \mathfrak{r} \begin{bmatrix} Q \end{bmatrix}} \text{ (refine-int)}\mathfrak{r}$$

It is worth noting that now the condition on the compositional abstraction <sup>J</sup>r<sup>K</sup> ] A<sup>0</sup> can easily be checked by the analyser, possibly alongside the analysis of r with LCL or using a stand-alone abstract interpreter. Moreover, this rule is as powerful as the original (refine-ext) because it allows to prove a logical completeness result akin to Theorem 4.

Theorem 5 (Logical completeness of (refine-int)). Consider the proof system of Fig. <sup>3</sup> with the addition of rules (refine-int) and (limit). If <sup>Q</sup> <sup>≤</sup> <sup>J</sup>rK<sup>P</sup> and JrK <sup>A</sup>α(P) = α(Q) then `<sup>A</sup> [P] r [Q].

Just like logical completeness for (refine-ext), this result implies the existence of a refinement A<sup>0</sup> in which it is possible to carry out the proof (possibly the concrete domain C). The discussion about how to find one is sketched in Section 4.3.

The second derived rule we propose is simpler than (refine-ext), as it just checks the abstractions A(P) and A<sup>0</sup> (P), with no reference to the regular command r nor to the postcondition Q. Since the premise is only on the precondition P, we call this rule (refine-pre).

Proposition 2. The following rule (refine-pre) is extensionally sound:

$$\frac{\vdash\_{A'} \begin{bmatrix} P \end{bmatrix} \mathfrak{r} \begin{bmatrix} Q \end{bmatrix} \quad A' \preceq A \quad A'(P) = A(P)}{\vdash\_A \begin{bmatrix} P \end{bmatrix} \mathfrak{r} \begin{bmatrix} Q \end{bmatrix}} \text{ (refine-pre)}$$

Rule (refine-pre) only requires a simple check at the application site instead of an expensive analysis of the program r, so it can be preferred in practice.

We present an example to highlight the advantages of this rule (as well as (refine-int)), which allows us to use different domains in the proof derivation of different parts of the program.

Example 5 (The use of (refine-pre)). Consider the two program fragments

$$\begin{aligned} \mathbf{r}\_1 &\triangleq \{ \mathbf{y} \, != \mathbf{0} \} ?; \; \mathbf{y} &:= \mathbf{abs}(\mathbf{y})\\ \mathbf{r}\_2 &\triangleq \mathbf{x} := \mathbf{y}; \; \mathbf{while} \; \{ \mathbf{x} > \mathbf{1} \} & \{ \mathbf{y} &:= \mathbf{y} - \mathbf{1}; \; \mathbf{x} &:= \mathbf{x} - \mathbf{1} \} \\ &= \mathbf{x} := \mathbf{y}; \; \{ \{ \mathbf{x} > \mathbf{1} \} ?\}; \; \mathbf{y} &:= \mathbf{y} - \mathbf{1}; \; \mathbf{x} &:= \mathbf{x} - \mathbf{1} \} ^\*; \; \{ \mathbf{x} &\gets \mathbf{1} \} ? \end{aligned}$$

and the program r , r1;r2. Here abs is a function to compute the absolute value, and we assume, for the sake of simplicity, that the analyser knows its best abstraction. Consider the concrete domain P(Z 2 ) where a pair (n, m) denote a state x = n, y = m, and the initial state P = (y ∈ [−100; 100]), a logical description of the concrete {(n, m)| m ∈ [−100; 100]} ∈ P(Z 2 ). The bca <sup>J</sup>r<sup>K</sup> Int in the abstract domain of intervals is locally complete on P (since P is expressible in Int), but the compositional abstraction <sup>J</sup>r<sup>K</sup> ] Int is not:

$$\begin{aligned} [\mathbf{r}]^{\mathsf{Int}}\alpha(P) &= \mathsf{Int}(\{\mathbf{r}\_2\}[\mathbf{r}\_1](\{(n,m) \,|\, m \in [-100; 100]\})) \\ &= \mathsf{Int}(\{\mathbf{r}\_2\}(\{(n,m) \,|\, m \in [1; 100]\})) \\ &= \mathsf{Int}(\{(1,1)\}) \\ &= ([1;1] \times [1;1]), \end{aligned}$$

$$\frac{\mathsf{C}\_{P}^{\mathsf{Int}\_{\neq 0}}(\begin{bmatrix} \mathsf{y} & \mathsf{t} \ \mathsf{0}? \end{bmatrix})}{\vdash\_{\mathsf{Int}\_{\neq 0}} [P] \ \mathsf{y} \ \mathsf{t} \ \mathsf{t} = \mathsf{0}? \ [R\_{1}] \ }} \frac{\begin{array}{l} \mathsf{C}\_{R\_{1}}^{\mathsf{Int}\_{\neq 0}}(\begin{bmatrix} \mathsf{y} & \mathsf{z} \ \mathsf{abs}\left(\mathbf{y} \right) \end{bmatrix}) \end{array}}{\begin{array}{l} \vdash\_{\mathsf{Int}\_{\neq 0}} [R\_{1}] \ \mathsf{y} \ \mathsf{t} \ \mathsf{e} = \mathsf{abs}\left(\mathbf{y} \right) \ [\mathsf{y} \in [1; 100]] \end{array}} \begin{array}{l} \begin{array}{l} \mathsf{C}\_{\mathsf{Int}\_{\neq 0}}(\begin{array}{l} \mathsf{x} \\ \end{array} \begin{array}{l} \mathsf{Tx} \ \mathsf{t} \ \mathsf{e} \end{array} \end{array}} \begin{array}{l} \begin{array}{l} \mathsf{Tx} \ \mathsf{s} \ \mathsf{t} \end{array} \end{array} \right) \end{array}$$

$$\text{Fig.5: Derivation of } \vdash\_{\mathsf{Int}\_{\neq 0}} [P] \text{ } r\_1 \text{ } [R] \text{ for Example 5.}$$

while

$$\begin{aligned} \left[\mathbf{r}\right]\_{\text{Int}}^{\sharp}\alpha(P) &= \left[\mathbf{r}\_2\right]\_{\text{Int}}^{\sharp}\left[\mathbf{r}\_1\right]\_{\text{Int}}^{\sharp}\left([-\infty; +\infty] \times [-100; 100]\right) \\ &= \left[\mathbf{r}\_2\right]\_{\text{Int}}^{\sharp}\left[\mathbf{y} := \mathbf{abs}\left(\mathbf{y}\right)\right]^{\text{Int}}\left([-\infty; +\infty] \times [-100; 100]\right) \\ &= \left[\mathbf{r}\_2\right]\_{\text{Int}}^{\sharp}\left([-\infty; +\infty] \times [0; 100]\right) \\ &= \left([1; 1] \times [0; 100]\right) \neq \left([1; 1] \times [1; 1]\right). \end{aligned}$$

The issues are twofold. First, the analysis of r<sup>1</sup> in Int is incomplete, so we need a more concrete domain. For instance Int<sup>6</sup>=0, the Moore closure of Int with the addition of the element Z<sup>6</sup>=0 representing the property of being nonzero would work. Intuitively, Int<sup>6</sup>=0 contains all intervals, possibly having a "hole" in 0. Formally

$$\mathsf{lnt}\_{\neq 0} = \mathsf{lnt} \cup \{ I\_{\neq 0} \mid I \in \mathsf{lnt} \},$$

with γ 0 (I<sup>6</sup>=0) = γ(I) \ {0}. However, note that there is no need for a relational domain to analyze r<sup>1</sup> since variable x is never mentioned in it. On the contrary, the analysis of r<sup>2</sup> requires a relational domain to track the information that the value of variable x is equal to the value of variable y. This suggests, for instance, to use the octagons domain Oct [15] to analyze r2. It is worth noting that the domain of octagons Oct would not be able to perform a locally complete analysis of r<sup>1</sup> for the same reasons that the domain Int could not.

However, rule (refine-pre) allows us to combine these different proof derivations. Since the program state between r<sup>1</sup> and r<sup>2</sup> can be precisely represented in Int, we use this domain as a baseline and refine it in Int<sup>6</sup>=0 and Oct for the two parts respectively.

Let R = (y ∈ {1; 2; 100}) that is an under-approximation of the concrete state in between r<sup>1</sup> and r<sup>2</sup> with the same abstraction in Int, so we can prove the triple `Int [P] r<sup>1</sup> [R]. Note that the concrete point 2 was added to R in order to have local completeness for (x > 1)? in r2. However, this triple cannot be proved in Int because <sup>J</sup>r<sup>1</sup><sup>K</sup> ] Int is not locally complete on P, so we resort to (refine-pre) to change the domain to Int<sup>6</sup>=0. The full derivation in Int<sup>6</sup>=0 is shown in Fig. 5, where R<sup>1</sup> = (y ∈ [−100; 100] ∧ y 6= 0) and we omitted for simplicity the additional hypothesis of (relax).

Again <sup>J</sup>r<sup>2</sup><sup>K</sup> is locally complete on <sup>R</sup> in Int, but the compositional analysis Jr2K ] Int is not. Hence to perform the derivation we resort to (refine-pre) to introduce relational information in the abstract domain, using Oct instead of Int. Let Q = (x = 1 ∧ y = 1), that is the concrete output of the program, so that we can prove `Int [R] r<sup>2</sup> [Q]. The derivation of this triple is only in Appendix A, Fig. 6. However, the proof is just a straightforward application of rules (seq), (iterate) and (transfer).

With those two derivation, the proof of the triple `Int [P] r [Q] is straightforward using (refine-pre):

$$\begin{array}{c} \vdash\_{\mathsf{Int}\_{\neq 0}} \begin{bmatrix} P \end{bmatrix} \mathsf{r}\_{1} \begin{bmatrix} R \end{bmatrix} \\ \hline \vdash\_{\mathsf{Int}} \begin{bmatrix} P \end{bmatrix} \mathsf{r}\_{1} \begin{bmatrix} R \end{bmatrix} \end{array} (\mathsf{refine-pre}) \quad \begin{array}{c} \vdash\_{\mathsf{Out}} \begin{bmatrix} R \end{bmatrix} \mathsf{r}\_{2} \begin{bmatrix} Q \end{bmatrix} \\ \hline \vdash\_{\mathsf{Int}} \begin{bmatrix} R \end{bmatrix} \mathsf{r}\_{2} \begin{bmatrix} Q \end{bmatrix} \end{array} (\mathsf{refine-pre}) \end{array}$$

For the derivation to fit the page, we write here the additional hypotheses of the rules. For the first application, Int<sup>6</sup>=0 Int and Int<sup>6</sup>=0(P) = P = Int(P). For the second, Oct Int and Int(R) = (y ∈ [1; 100]) = Oct(R).

It is worth noting that, in this example, all applications of (refine-pre) can be replaced by (refine-int). This means that also the latter is able to exploit Int<sup>6</sup>=0 and Oct to prove the triple in the very same way, but its application requires more expensive abstract analyses than the simple checks of (refine-pre). ut

While (refine-pre) is simpler than (refine-ext) and (refine-int), it is also weaker in both a theoretical and practical sense. On the one hand, LCL<sup>A</sup> extended with this rule does not admit a logical completeness result; on the other hand, there are situations in which, even though (refine-pre) allows a derivation, the other rules are more effective. We show these two points by examples. For the first, we propose a sound triple that LCL<sup>A</sup> extended with (refine-pre) cannot prove. Since the example is quite technical, here we only sketch the idea, and leave the details only in Appendix A, Example 8.

Example 6 (Logical incompleteness of (refine-pre)). Consider the concrete domain C = P(Z) of integers, the abstract domain Int of intervals, the concrete point P = {−1, 1} and commands r<sup>1</sup> , x != 0?, r<sup>2</sup> , x >= 0? and r , r1;r2. Then the triple `Int [P] r1;r<sup>2</sup> [{1}] is sound but cannot be proved in LCL<sup>A</sup> extended with (refine-pre).

The key observations for this example are two. First, all strict subset P <sup>0</sup> ⊂ P are such that Int(P 0 ) ⊂ Int(P). Moreover, for all refinements A<sup>0</sup> Int such that A<sup>0</sup> (P) = Int(P) we have the same condition, namely if P <sup>0</sup> ⊂ P then A0 (P 0 ) ⊂ A<sup>0</sup> (P). This is because for all P <sup>0</sup> ⊂ P we have A<sup>0</sup> (P 0 ) ⊆ Int(P 0 ) ⊂ Int(P) = A<sup>0</sup> (P). Second, <sup>J</sup>r<sup>1</sup>K<sup>P</sup> <sup>=</sup> <sup>P</sup>. This means that all triples appearing in the derivation tree of `Int [P] r1;r<sup>2</sup> [{1}] have the same precondition P. Since (refine-pre) requires A<sup>0</sup> (P) = Int(P), all possible applications of this rule change the abstract domain to some A<sup>0</sup> satisfying the condition above. Since LCL<sup>A</sup> computes under-approximations with the same abstraction of the strongest postcondition, these two observations make it impossible to under-approximate P further, both with (relax) and (refine-pre). This in turn make the triple not provable because <sup>J</sup>r<sup>2</sup><sup>K</sup> is not locally complete on <sup>P</sup> in Int or in any refinement satisfying

 $A'(P) = \mathsf{Int}(P)$ : 
$$A' \| \mathsf{r}\_2 \| (P) = A'(\{1\}) \subseteq \mathsf{Int}(\{1\}) = \{1\}$$
  $A' \| \mathsf{r}\_2 \| A'(P) \supseteq \{\mathsf{r}\_2\}$  $A'(P) = \{\mathsf{r}\_2\}$ (\mathsf{Int}(P)) = \{0, 1\}.

Example 8 in Appendix A exhibits the formal argument showing that this triple cannot be proved. ut

As a corollary, this example (and more in general logical incompleteness) shows that is not always possible to find a refinement A<sup>0</sup> to carry out the proof using (refine-pre). Another consequence of this incompleteness result is the fact that, even when a command is locally complete in an abstract domain A, we may need to reason about properties that are not expressible in A in order to prove it, as (refine-pre) may not be sufficient.

Second, we present an example to illustrate that there are situations in which (refine-int) is more practical than (refine-pre), even though they are both able to prove the same triple.

Example 7. Consider the two program fragments

$$\begin{aligned} \mathbf{r}\_1 &\triangleq \{ \mathbf{y} \; != \mathbf{0} \} \mathbf{?}; \; \mathbf{x} &\coloneqq \mathbf{y}; \; \mathbf{y} &\coloneqq \mathbf{abs} \{ \mathbf{y} \} \\ \mathbf{r}\_2 &\triangleq \mathbf{x} &\coloneqq \mathbf{y}; \; \mathbf{while} & \{ \mathbf{x} \geq \mathbf{1} \} \left\{ \mathbf{ y} &\coloneqq \mathbf{y} - \mathbf{1}; \; \mathbf{x} &\coloneqq \mathbf{x} - \mathbf{1} \right\} \end{aligned}$$

and the program r , r1;r2. Consider also the initial state P = y ∈ [−100; 100].

This example is a variation of Example 5: the difference is the introduction of the relational dependency x := y in r1, that is partially stored in the postcondition R of r1. Because of this, Oct(R) and Int(R) are different, so we cannot apply (refine-pre) to prove [R] r<sup>2</sup> [Q] for some Q.

Following Example 5, the domain Int<sup>6</sup>=0 is able to infer on r<sup>1</sup> a subset R of the strongest postcondition y ∈ [1; 100] ∧ y = abs(x) with the same abstraction Int<sup>6</sup>=0(R) = [−100; 100]<sup>6</sup>=0 × [1; 100]. However, for any such R we cannot use (refine-pre) to prove the triple `Int [R] r<sup>2</sup> [x = 1 ∧ y = 1] via Oct because Int(R) = x ∈ [−100; 100] ∧ y ∈ [1; 100] while Oct(R) = 1 ≤ y ≤ 100 ∧ −y ≤ x ≤ y. More in general, any subset of the strongest postcondition contains the relational information y = abs(x), so relational domains like octagons and polyhedra [9] do not have the same abstraction as the non-relational Int, preventing the use of (refine-pre). However, we can apply (refine-int): considering R = (y ∈ {1; 2; 100} ∧ y = abs(x)), Q = (x = 1 ∧ y = 1) and r<sup>w</sup> , while (x > 1) { y := y - 1; x := x - 1 }, we have

$$\begin{split} \lVert \text{Int} \lVert \mathbf{r}\_2 \rVert\_{\text{Opt}}^\sharp \ln \mathsf{tr}(R) &= \lVert \text{tr} \rVert\_2 \lVert \mathsf{r}\_2 \text{tr} \left( \mathbf{x} \in [-100; 100] \wedge \mathbf{y} \in [1; 100] \right) \\ &= \lVert \text{tr} \rVert\_{\text{Opt}} \lVert \mathbf{x} \rVert := \mathbf{y} \rVert\_{\text{Opt}}^\sharp \left( \mathbf{x} \in [-100; 100] \wedge \mathbf{y} \in [1; 100] \right) \\ &= \lVert \text{tr} \rVert\_{\text{Opt}} \rVert\_{\text{Opt}}^\sharp (1 \le \mathbf{y} \le 100, \mathbf{y} = \mathbf{x}) \\ &= \lVert \text{tr} (\mathbf{x} = 1 \wedge \mathbf{y} = 1) \\ &= \lVert \text{tr} (Q) . \end{split}$$

In this example, rule (refine-pre) can be applied to prove the triple, but it requires to have relational information from the assignment x := y in r1, hence forcing the use of a relational domain (eg. the reduced product [7] of Oct and Int6=0) for the whole r, making the analysis more expensive. ut

#### 4.3 Choosing The Refinement

All three new rules allow to combine different domains in the same derivation, but do not define an algorithm because of the choice of the right refinement to use is nondeterministic. A crucial point to their applicability is a strategy to select the refined abstract domain. While we have not addressed this problem yet, we believe there are some interesting starting points in the literature.

As already anticipated in previous sections, we settled the question from a theoretical point of view. Logical completeness results for (refine-ext) (Theorem 4) and (refine-int) (Theorem 5) implies the existence of a domain in which it is possible to complete the proof (if this were not the case, then the proof could not be completed in any domain, against the logical completeness). However, the proofs of those theorems exhibit the concrete domain C as an example, which is unfeasible in general. Dually, as (refine-pre) is logically incomplete (Example 6), there are triples that cannot be proved in any domain with it.

As more practical alternatives, we envisage some possibilities. First, we are studying relationships with counterexample-guided abstraction refinement (CE-GAR) [4], which is a technique that exploits refinement in the context of abstract model checking. However, CEGAR and our approach seem complementary. On the one hand, our refinement rules allow a dynamic change of domain, during the analysis and only for a part of it, while CEGAR performs a static refinement and then a new analysis of the whole transition system in the new, more precise domain. On the other hand, our rules lack an instantiation technique, while for CEGAR there are effective algorithms available to pick a suitable refinement.

Second, local completeness shell [3] were proposed as an analogous of completeness shell [11] for local completeness. In the article, the authors proposed to use local completeness shells to perform abstract interpretation repair, a technique to refine the abstract domain depending on the program to analyse, just like CEGAR does for abstract model checking. Abstract interpretation repair works well with LCLA, and could be a way to decide the best refinement for one of our rules in presence of a failed local completeness proof obligation. The advantage of combining repair with our new rules is given by the possibility of discarding the refined domain just after its use in a subderivation instead of using it to carry out the whole derivation. Investigations in this direction is ongoing.

Another related approach, which shares some common ground with CEGAR, is Lazy (Predicate) Abstraction [12,14]. Both ours and this approach exploits different abstract domains for different parts of the proof, refining it as needed. The key difference is that Lazy Abstraction unwinds the control flow graph (CFG) of the program (with techniques to handle loops) while we work inductively on the syntax. This means that, when Lazy Abstraction refines a domain, it must use it from that point onward (unless it finds a loop invariant). On the other


Table 1: Comparison of the proof systems

hand, our method can change abstract domain even for different parts of sequential code. However, the technique used in Lazy Abstraction (basically to trace a counterexample back with a theorem prover until it is either found to be spurious or proved to be true) could be applicable to LCLA: a failed local completeness proof obligation in (transfer) can be traced back with a theorem prover and the failed proof can be used to understand how to refine the abstract domain.

# 5 Conclusions

In this paper, we have proposed a logical framework to prove both correctness and incorrectness of a program exploiting locally complete abstractions. Indeed, from any provable triple [P] r [Q] we can either prove that r meets an expressible specification Spec or find a concrete counterexample in Q. Differently from the original LCL<sup>A</sup> [2], that was proved to be intensionally sound, our framework is extensionally sound, meaning that is able to prove more properties about programs. To achieve this, our inference rules are based on the best correct abstraction of a program behaviour instead of a generic abstract interpreter. The key feature of our proof systems is the ability to exploit different abstract domains to analyse different portions of the whole program. In particular, the domains are selected among the refinements of a chosen abstract domain from which the analysis begins. The main advantage of our extensional approach is the possibility of proving many triples that could not be proved in LCL<sup>A</sup> because of the way the program is written. More in details, we presented three new rules to refine the abstract domain, each of which can be added independently to the proof system with different complexity-precision trade-off.

Table 1 summarizes the properties LCL<sup>A</sup> enjoys when extended with different rules, and Figure 1 from the Introduction graphically compare the logical strength of these proof systems. (refine-ext) is the most general rule, from which the other two (refine-int) and (refine-pre) are derived. The former turns out to be as strong as (refine-ext), since they are both logically complete, while the latter is simpler to use, although weaker.

Future work. In principle completeness could be achieved either refining or simplifying the abstract domain [11]. In this article we have only focused on refinement rules for local completeness, but we are investigating some simplification rules as well as their relation to the ones presented in this paper. To date, domain simplification seems theoretically weaker, but apparently it can accommodate for techniques useful in practice that are beyond the reach of refinement rules.

While the new rules we introduced are relevant from both a theoretical and practical point of view, they do not define an algorithm because of their nondeterminism: we need techniques to determine when a change of abstract domain is needed and how to choose the most convenient new domain. We believe these two issues are actually related. For instance, if the analysis is unable to satisfy a local completeness proof obligation to apply (transfer), an heuristics may determine both what additional information is needed to make it true (i.e., how to refine the abstract domain) and where that additional information came from (i.e., when to refine). We briefly discussed in Section 4.3 some possibilities to perform this choice. Ideally, one would systematically select an off-the- shelf abstract domain best suited to deal with each code fragment and the heuristic would inspect the proof obligations, and exploit some sort of catalog that can track suitable abstract domains that are locally complete for the code and input at hand or derive on-the-fly some convenient domain refinement as done, e.g., by partition refinement. To this aim, we intend to investigate a mutual exchange of ideas between CEGAR and our approach, and to integrate abstract interpretation repair into our framework.

Acknowledgments. We thank the anonymous referees for their helpful comments that helped us to improve the presentation and the discussion with related work.

# Appendix A Proofs and Supplementary Material

#### A.1 Extensional Soundness (Theorem 2)

Proof (Proof of Theorem 2). First we remark that points (1) and (3) implies point (2):

$$\begin{aligned} \alpha(Q) &\leq \alpha(\lbrack \mathfrak{r} \rbrack \lbrack P \rbrack) & \text{(1) and monotonicity of } \alpha \rbrack \\ &\leq \lbrack \mathfrak{r} \rbrack^A \alpha(P) & \text{[soundness of } \lbrack \mathfrak{r} \rbrack^A] \\ &= \alpha(Q) & \text{(3)} \end{aligned}$$

So all the lines are equal, in particular <sup>α</sup>(Q) = <sup>α</sup>(JrKP). The proof is then by induction on the derivation tree of `<sup>A</sup> [P] r [Q], but we only have to prove (1) and (3) because of the observation above. We only include one inductive case as an example, others are standard.

(seq): (1) <sup>Q</sup> <sup>≤</sup> <sup>J</sup>r<sup>2</sup>K<sup>R</sup> <sup>≤</sup> <sup>J</sup>r<sup>2</sup>K(Jr<sup>1</sup>KP) = <sup>J</sup>r1;r<sup>2</sup>KP, where the inequalities follow from inductive hypotheses and monotonicity of <sup>J</sup>r<sup>2</sup>K.

(3) We recall that <sup>J</sup>r1;r<sup>2</sup><sup>K</sup> <sup>A</sup> <sup>≤</sup> <sup>J</sup>r<sup>2</sup><sup>K</sup> AJr1K A.

$$\begin{aligned} \alpha(Q) &\leq \alpha(\lbrack \mathfrak{r}\_{1}; \mathfrak{r}\_{2} \rbrack \lbrack P \rbrack) & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \lbrack (1) \text{ and monotonicity of } \alpha \rbrack \\ &\leq \lbrack \mathfrak{r}\_{1}; \mathfrak{r}\_{2} \rbrack ^{A} \alpha(P) & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \text{[soundness of } \lbrack \mathfrak{r} \rbrack ^{A} \rbrack \\ &\leq \lbrack \mathfrak{r}\_{2} \rbrack ^{A} \lbrack \mathfrak{r}\_{1} \rbrack ^{A} \alpha(P) & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \lbrack \text{recalled above} \rbrack \\ &= \lbrack \mathfrak{r}\_{2} \rbrack ^{A} \alpha(R) & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \$$

So all the lines are equal, in particular <sup>J</sup>r1;r2<sup>K</sup> <sup>A</sup>α(P) = α(Q).

#### ut

#### A.2 Soundness and Completeness of (refine-ext)

This technical lemma is used in the following proofs.

Lemma 1. If A<sup>0</sup> A then A = AA<sup>0</sup> = A<sup>0</sup>A

Proof. Fix a concrete element c ∈ C. Since A<sup>0</sup> A we have c ≤ A<sup>0</sup> (c) ≤ A(c). Applying A, by monotonicity we get A(c) ≤ AA<sup>0</sup> (c) ≤ AA(c) = A(c), where the last equality is idempotency of A. This means A = AA<sup>0</sup> . Now consider A<sup>0</sup>A(c). Since A is a closure operator A<sup>0</sup>A(c) ≤ A(A<sup>0</sup>A(c)). But we just showed AA<sup>0</sup> (A(c)) = A(A(c)) = A(c). Lastly, since A<sup>0</sup> is a closure operator too, A(c) ≤ A<sup>0</sup>A(c). Hence A(c) ≤ A<sup>0</sup>A(c) ≤ A(c), so A(c) = A<sup>0</sup>A(c).

We point out that, by injectivity of γ, this also means αγ<sup>0</sup>α <sup>0</sup> = α.

Proof (Proof of Theorem 3). We recall that the intuitive premise <sup>A</sup>Jr<sup>K</sup> A 0 A(P) = A(Q) of the rule formally is αγ<sup>0</sup> JrK A 0 α <sup>0</sup>A(P) = α(Q). Since the proof of Theorem 2 is by induction, we can extend it just proving the inductive case for (refine-ext).

(1) It's the same as point (1) of extensional soundness (Theorem 2) applied to `A<sup>0</sup> [P] r [Q], since this conclusion does not depend on the abstract domain. (2-3)

$$\begin{aligned} \alpha(Q) &\le \alpha(\lbrack \mathbf{r} \rbrack P) & \text{[(1) and monotonicity of  $\alpha$ ]}\\ &\le \lbrack \mathbf{r} \rbrack^A \alpha(P) & \text{[soundness of [r]}^A] \\ &= \alpha \lbrack \mathbf{r} \rbrack \gamma \alpha(P) & \text{[definitions]}\\ &= \alpha \gamma' \alpha' \lbrack \mathbf{r} \rbrack \gamma' \alpha' \gamma \alpha(P) & \text{[Lemma 1]}\\ &= \alpha \gamma' \lbrack \mathbf{r} \rbrack^{A'} \alpha' A(P) & \text{[definition]}\\ &= \alpha(Q) & \text{[ hypothesis of the rule]} \end{aligned}$$

Hence all the lines are equal; in particular <sup>α</sup>(JrKP) = <sup>α</sup>(Q) and <sup>J</sup>r<sup>K</sup> <sup>A</sup>α(P) = α(Q). ut Proof (Proof of Theorem 4). First, the hypotheses of the theorem implies C A P (JrK):

$$\begin{aligned} \left[\mathbb{r}\right]^A \alpha(P) &= \alpha(Q) & \text{[hp of the theorem]}\\ &\leq \alpha(\left[\mathbb{r}\right]P) & \text{[monotonicity of } \alpha \text{ and hp of the theorem } Q \leq \left[\mathbb{r}\right]P] \\ &\leq \left[\mathbb{r}\right]^A \alpha(P) & \text{[soundness of } \left[\mathbb{r}\right]^A] \end{aligned}$$

Hence <sup>α</sup>(JrKP) = <sup>J</sup>r<sup>K</sup> <sup>A</sup>α(P) = <sup>α</sup>JrKγα(P), that is local completeness. Moreover <sup>α</sup>(Q) = <sup>α</sup>(JrKP).

Now consider a triple P,r, Q satisfying the hypotheses. If Q < <sup>J</sup>rKP, using (relax) we get

$$\frac{P \le P \le A(P) \quad \vdash\_A [P] \text{ r } [\lbrack \mathbf{r} \rbrack P] \quad Q \le \lbrack \mathbf{r} \rbrack P \le A(Q)}{\vdash\_A [P] \text{ r } [Q]} \text{ (relax)}$$

But the first condition is trivial, and the third one is made of <sup>Q</sup> <sup>≤</sup> <sup>J</sup>rK<sup>P</sup> (the hypothesis) and <sup>J</sup>rK<sup>P</sup> <sup>≤</sup> <sup>A</sup>(Q), that follows because <sup>α</sup>(JrKP) = <sup>α</sup>(Q) (shown above) and in a GC this implies <sup>J</sup>rK<sup>P</sup> <sup>≤</sup> γα(Q) = <sup>A</sup>(Q). Hence without loss of generality we can assume <sup>Q</sup> <sup>=</sup> <sup>J</sup>rKP.

Now we want to apply (refine-ext) to move to the concrete domain C. Clearly C A. The last hypothesis of the rule can be readily verified recalling that JrK <sup>C</sup> <sup>=</sup> <sup>J</sup>r<sup>K</sup> and <sup>α</sup> <sup>0</sup> = γ <sup>0</sup> = id<sup>C</sup> :

$$\begin{aligned} \alpha \lbrack r \rbrack^C A(P) &= \alpha \lbrack r \rbrack A(P) \\ &= \lbrack r \rbrack^A \alpha(P) \\ &= \alpha(\lbrack r \rbrack P) \end{aligned}$$

so if we can show `<sup>C</sup> [P] <sup>r</sup> [JrKP] we can apply (refine-ext) to prove the triple `<sup>A</sup> [P] <sup>r</sup> [JrKP]:

$$\frac{\vdash\_C \left[P\right] \text{ r ([[r]]P]} \quad C \preceq A \quad A(\[r\]^C A(P) = A(\[r\]P)}{\vdash\_A \left[P\right] \text{ r ([[r]]P]}} \text{ (refine-ext)}.$$

Lastly, we resort to logical completeness of LCL<sup>A</sup> (cf. [2], Th 5.11) to say that the triple `<sup>C</sup> [P] <sup>r</sup> [JrKP] is provable. The hypothesis of that theorem are satisfied: all expressions are globally complete in the concrete domain <sup>C</sup>, <sup>J</sup>rK<sup>P</sup> <sup>≤</sup> <sup>J</sup>rK<sup>P</sup> and JrK ] C id<sup>C</sup> (P) = <sup>J</sup>rK<sup>P</sup> = id<sup>C</sup> (JrKP), where we used <sup>α</sup> <sup>0</sup> = id<sup>C</sup> and <sup>J</sup>r<sup>K</sup> ] <sup>C</sup> <sup>=</sup> <sup>J</sup>rK. ut

#### A.3 Derived Refinement Rules

Proof (Proof of Proposition 1). We show that the hypotheses of (refine-int) implies those of (refine-ext). This means than whenever we can apply the former we could also apply the latter, that in turn means Theorem 3 ensures extensional soundness.

The first two hypotheses `A<sup>0</sup> [P] r [Q] and A<sup>0</sup> A are shared among the two rules, so we only have to show αγ<sup>0</sup> JrK A 0 α <sup>0</sup>A(P) = α(Q). We recall that `A<sup>0</sup> [P] <sup>r</sup> [Q] implies <sup>Q</sup> <sup>≤</sup> <sup>J</sup>rK<sup>P</sup> by extensional soundness.


Hence all the lines are equal, and in particular αγ<sup>0</sup> JrK A 0 α <sup>0</sup>A(P) = α(Q). ut

Proof (Proof of Theorem 5). The proof is the same as that of Theorem 4, the only difference being that to apply (refine-int) we need to show <sup>A</sup>Jr<sup>K</sup> ] <sup>C</sup> A(P) = <sup>A</sup>(JrKP) instead of <sup>A</sup>Jr<sup>K</sup> <sup>C</sup> <sup>A</sup>(P) = <sup>A</sup>(JrKP). However, since in the concrete domain JrK ] <sup>C</sup> <sup>=</sup> <sup>J</sup>r<sup>K</sup> <sup>C</sup> <sup>=</sup> <sup>J</sup>r<sup>K</sup> the proof still holds. ut

Proof (Proof of Proposition 2). As in the proof or Proposition 1 above, we show that the hypotheses of (refine-pre) implies those of (refine-ext).

The first two hypotheses `A<sup>0</sup> [P] r [Q] and A<sup>0</sup> A are shared among the two rules, so we only have to show αγ<sup>0</sup> JrK A 0 α <sup>0</sup>A(P) = α(Q). We recall that `A<sup>0</sup> [P] <sup>r</sup> [Q] implies by extensional soundness (1) <sup>Q</sup> <sup>≤</sup> <sup>J</sup>rK<sup>P</sup> and (3) <sup>J</sup>r<sup>K</sup> A 0 α 0 (P) = α 0 (Q).


Hence all the lines are equal, and in particular αγ<sup>0</sup> JrK A 0 α <sup>0</sup>A(P) = α(Q). ut

Details about Example 5. The full derivation of the triple `Oct [R] r<sup>2</sup> [Q] for Example 5 is shown in Fig. 6, rotated and split to fit the page. The command r<sup>i</sup> = (x > 1)?; y := y - 1; x := x - 1 is iterated with the Kleene star and we let R<sup>2</sup> = (y ∈ {1; 2; 100} ∧ x = y). We also used the logical implication R<sup>2</sup> =⇒ (y ∈ {1; 99} ∧ x = y), both explicitly and implicitly in the equivalence R<sup>2</sup> ∨ (y ∈ {1; 99} ∧ x = y) = R2.

 5.


Fig. 7: Derivation of `Int [P] r [Q] for Example 8.

Example 8 (Supplement to Example 6). Consider the concrete domain C = P(Z) of integers, the abstract domain Int of intervals, the concrete points P = {−1, 1} and Q = {1}, commands r<sup>1</sup> , x != 0?, r<sup>2</sup> , x >= 0? and r , r1;r2. Let f<sup>1</sup> = <sup>J</sup>r1K, <sup>f</sup><sup>2</sup> <sup>=</sup> <sup>J</sup>r2<sup>K</sup> and <sup>f</sup> <sup>=</sup> <sup>J</sup>r<sup>K</sup> <sup>=</sup> <sup>f</sup><sup>2</sup> ◦ <sup>f</sup>1. Observe that in the concrete semantics f1(P) = P and f(P) = f2(P) = {1}. Consider LCL<sup>A</sup> extended with (refine-pre), and let us show that we cannot prove `Int [P] r [Q]. Inspecting the logic, we can only apply three rules to prove this triple: (relax), (refine-pre) or (seq). To apply rule (relax) we would need either an under-approximation P <sup>0</sup> of P with the same abstraction, that does not exist, or an over-approximation of Q, that would be unsound since Q = f(P). Hence we cannot apply (relax). Suppose to apply (refine-pre): any A<sup>0</sup> used in the rule should satisfy A<sup>0</sup> Int and A<sup>0</sup> (P) = Int(P); as we remarked in Example 6 this means that P <sup>0</sup> ⊂ P implies A<sup>0</sup> (P 0 ) ⊂ A<sup>0</sup> (P). Again this means we cannot apply (relax) even after the domain refinement. The only rule that can be applied is then (seq): to do that, we must prove two triples `A<sup>0</sup> [P] r<sup>1</sup> [R] and `A<sup>0</sup> [R] r<sup>2</sup> [Q]. Irrespective of how we prove the first triple, by soundness (Theorem 2) we have R ⊆ f1(P) = P and A<sup>0</sup> (R) = A<sup>0</sup> (f1(P)) = A0 (P), so again R = P. Now we should prove a triple `A<sup>0</sup> [P] r<sup>2</sup> [Q], but this is impossible since by soundness this would imply local completeness of <sup>J</sup>r2<sup>K</sup> <sup>=</sup> <sup>f</sup><sup>2</sup> on P in A<sup>0</sup> , that does not hold:

$$\begin{aligned} A'f\_2(P) &= A'(\{1\}) \subseteq \mathsf{Int}(\{1\}) = \{1\} \\ A'f\_2A'(P) &\supseteq f\_2A'(P) = f\_2(\mathsf{Int}(P)) = \{0,1\} \end{aligned}$$

Observe that, if we add (refine-int) to the proof system, we can use it to change the domain to one where we can express P (for instance, the concrete domain P(Z) or the refinement Int∪ {P}) to prove the triple applying (seq) and then (transfer) on both subtrees, as shown in Fig. 7. ut

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Clustered Relational Thread-Modular Abstract Interpretation with Local Traces

Michael Schwarz1() , Simmo Saan<sup>2</sup> , Helmut Seidl<sup>1</sup> , Julian Erhard<sup>1</sup> , and Vesal Vojdani<sup>2</sup>

<sup>1</sup> Technische Universität München, Garching, Germany {m.schwarz, helmut.seidl, julian.erhard}@tum.de <sup>2</sup> University of Tartu, Tartu, Estonia {simmo.saan, vesal.vojdani}@ut.ee

Abstract. We construct novel thread-modular analyses that track relational information for potentially overlapping clusters of global variables – given that they are protected by common mutexes. We provide a framework to systematically increase the precision of clustered relational analyses by splitting control locations based on abstractions of local traces. As one instance, we obtain an analysis of dynamic thread creation and joining. Interestingly, tracking less relational information for globals may result in higher precision. We consider the class of 2-decomposable domains that encompasses many weakly relational domains (e.g., Octagons). For these domains, we prove that maximal precision is attained already for clusters of globals of sizes at most 2.

Keywords: thread-modular relational abstract interpretation, collecting local trace semantics, clusters, dynamic thread creation, concurrency

# 1 Introduction

Tracking relationships between program variables is indispensable for proving properties of programs or verifying the absence of certain programming errors [14, 16, 33]. Inferring relational properties is particularly challenging for multithreaded programs as all interferences by other threads that may happen in parallel, must be taken into account. In such an environment, only relational properties between globals protected by common mutexes are likely to persist throughout program execution. Generally, relations on clusters consisting of fewer variables are less brittle than those on larger clusters. Moreover, monolithic relational analyses employing, e.g., the polyhedral abstract domain are known to be notoriously expensive [36, 54]. Tracking smaller clusters may even be more precise than tracking larger clusters [19].

Example 1. Consider the following program. All accesses to globals g, h, and i are protected by the mutex a.

```
main :
 x = create(t1); y = create(t2);
 lock(a);
 g = ?; h = ?; i = ?;
 unlock(a); r = join(y); lock(a);
 z = ?; g = z; h = z; i = z;
 unlock(a); lock(a);
 // ASSERT(h==i); (1) ASSERT(g==h); (2)
 unlock(a);
                                              t1 :
                                               lock(a);
                                               x = h;
                                               i = x;
                                               unlock(a);
                                               return 1;
                                                              t2 :
                                                               lock(a);
                                                               g = ?; h = ?;
                                                               unlock(a);
                                                               return 0;
```
In this program, the main thread creates two new threads, starting at t<sup>1</sup> and t2, respectively. Then it locks the mutex a to set all globals non-deterministically to some value and unlocks a again. After having joined the thread t2, it locks a again and sets all globals to the same unknown value and unlocks a again. Thread t<sup>1</sup> sets i to the value of h. Thread t<sup>2</sup> sets g and h to (potentially diferent) unknown values. Assume we are interested in equalities between globals. In order to succeed in showing assertion (1), it is necessary to detect that the main thread is unique and thus cannot read its past writes since these have been overwritten. Additionally, the analysis needs to certify that thread t<sup>2</sup> also is unique, has been joined before the assertion, and that its writes must also have been overwritten.

For an analysis to prove assertion (2), propagating a joint abstraction of the values of all globals protected by a does not sufce: At the unlock of a in t1, g=h need not hold. If this monolithic relation is propagated to the last lock of a in main, (2) cannot be shown — despite t<sup>1</sup> modifying neither g nor h. ⊓⊔

Here we show, that the loss of precision indicated in the example can be remedied by replacing the monolithic abstraction of all globals protected by a mutex with suitably chosen subclusters. In the example, we propose to instead consider the subclusters {g, h} and {h, i} separately. As t<sup>1</sup> does not write any values to the cluster {g, h}, the imprecise relation ⊤ is not propagated to the main thread and assertion (2) can be shown.

To fne-tune the analysis, we rely on weakly relational domains. A variety of weakly relational domains have been proposed in the literature such as Two Variables Per Inequality [53], Octagons [36, 37], or simplifcations thereof [33, 35]. The technical property of interest which all these domains have in common is that each abstract relation can be reconstructed from its projections onto subclusters of variables of size at most 2. We call such domains 2-decomposable. Beyond the numerical 2-decomposable domains, also non-numerical 2-decomposable domains can be constructed such as a domain relating string names and function pointers.

Based on 2-decomposable domains, we design thread-modular relational analyses of globals which may attain additional precision by taking local knowledge of threads into account. Therefore, we do not rely on a global trace semantics, but on a local trace semantics which formalizes for each thread that part of the computational past it can observe [48]. Abstract values for program points describe the set of all reaching local traces. Likewise, values recorded for observable actions are abstractions of all local traces ending in the corresponding action. Such observable actions are, e.g., unlock operations for mutexes. The abstract values are then refned by taking fnite abstractions of local traces into account. To this end, we propose a generic framework that re-uses the components of any base analysis as black boxes. Our contributions can be summarized as follows:


The analyses in this paper have all been implemented, a report of a practical evaluation is included in Section 9, whereas Section 10 details related work.

# 2 Relational Domains

First, we defne the notion of relational domain employed in the description of our analysis. Let Vars be a set of variables, potentially of diferent types. We assume all confgurations and assignments to be well-typed, i.e., the type of the (abstract) value matches the one specifed for a variable. For each type τ of values, we assume a complete lattice V ♯ <sup>τ</sup> of abstract values abstracting the respective concrete values from V<sup>τ</sup> . Let V <sup>♯</sup> denote the collection of these lattices, and Vars →<sup>⊥</sup> V <sup>♯</sup> denote the set of all type-consistent assignments σ from variables to non-⊥ abstract values, extended with a dedicated least element (also denoted by ⊥), and equipped with the induced ordering. A relational domain R then is a complete lattice which provides the following operations

$$\begin{array}{lll} \left[x \leftarrow e\right]\_{\mathcal{R}}^{\sharp} : \mathcal{R} \to \mathcal{R} \text{ (assignment for expression } e) & \text{lift} : (\mathsf{Vars} \to\_{\perp} \mathsf{V}^{\sharp}) \to \mathcal{R} \\\ r|\_{Y} : \mathcal{R} \to \mathcal{R} \text{ ( restriction to } Y \subseteq \mathsf{Vars}) & \text{null} : \mathcal{R} \to (\mathsf{Vars} \to\_{\perp} \mathsf{V}^{\sharp}) \\\ [?e]\_{\mathcal{R}}^{\sharp} : \mathcal{R} \to \mathcal{R} \text{ (quador condition } e) & \end{array}$$

The operations to the left provide the abstract state transformers for the basic operation of programs (with non-deterministic assignments expressed as restrictions), while lift and unlift allow casting from abstract variable assignments to the relational domain as well as extracting single-variable information. We assume that lift ⊥ = ⊥ and unlift ⊥ = ⊥, and require that unlift ◦ lift ⊒ id where ⊒ refers to the ordering of (Vars →<sup>⊥</sup> V ♯ ). Moreover, we require that the meet operations ⊓ of V <sup>♯</sup> and R safely approximate the intersection of the concretizations of the respective arguments. Restricting a relation r to a subset Y of variables amounts to forgetting all information about variables not in Y . Thus, we demand r|<sup>V</sup>ars = r, r|<sup>∅</sup> = ⊤, r|<sup>Y</sup><sup>1</sup> ⊒ r|<sup>Y</sup><sup>2</sup> when Y<sup>1</sup> ⊆ Y2, (r|<sup>Y</sup><sup>1</sup> ) Y<sup>2</sup> = r|<sup>Y</sup>1∩Y<sup>2</sup> , and

$$\mathsf{unlift}\left(r|\_{Y}\right)x = \top \quad \left(x \notin Y\right) \qquad\qquad \mathsf{unlift}\left(r|\_{Y}\right)x = \left(\mathsf{unlift}\,r\right)x \quad \left(x \in Y\right) \tag{1}$$

Restriction thus is idempotent. For convenience, we also defne a shorthand for assignment of abstract values<sup>3</sup> : <sup>J</sup><sup>x</sup> <sup>←</sup><sup>♯</sup> <sup>v</sup><sup>K</sup> ♯ <sup>R</sup> r = r|<sup>V</sup>ars\{x} ⊓ (lift(⊤ ⊕ {x 7→ v})). In order to construct an abstract interpretation, we further require monotonic concretization functions γV<sup>♯</sup> : V <sup>♯</sup> → 2 <sup>V</sup> and γ<sup>R</sup> : R → 2 <sup>V</sup>ars→V satisfying the requirements presented in Fig. 1.

Example 2. As a value domain V ♯ τ , consider the fat lattice over the sets of values of appropriate type τ . A relational domain R<sup>1</sup> is obtained by collecting satisfable conjunctions of equalities between variables or variables and constants where the ordering is logical implication, extended with False as least element. The greatest element in this complete lattice is given by True. The operations lift and unlift for non-⊥ arguments then can be defned as

$$\text{lift } \sigma = \bigwedge \{ x = \sigma \, x \mid x \in \mathsf{Vars}, \sigma \, x \neq \top \} \qquad \text{null} \\ \text{tr } x = \begin{cases} c & \text{if } r \implies (x = c) \\ \top & \text{otherwise} \end{cases}$$

The restriction of r to a subset Y of variables is given by the conjunction of all equalities implied by r which only contain variables from Y or constants. ⊓⊔

In line of Example 2, also non-numerical relational domains may be constructed.

A variable clustering S ⊆ 2 <sup>V</sup>ars is a set of subsets (clusters) of variables. For any cluster Y ⊆ Vars, let R<sup>Y</sup> = {r | r ∈ R, r|<sup>Y</sup> = r}; this set collects all abstract values from R containing information on variables in Y only. Given an arbitrary clustering S ⊆ 2 Vars , any relation r ∈ R can be approximated by a meet of relations from R<sup>Y</sup> (Y ∈ S) since for every r ∈ R, r ⊑ d {r|<sup>Y</sup> | Y ∈ S} holds.

Some relational domains, however, can be fully recovered from their restrictions to specifc subsets of clusters. We consider for k ≥ 1, the set S<sup>k</sup> of all non-empty subsets Y ⊆ Vars of cardinality at most k. We call a relational domain R k-decomposable if each abstract value from R can be precisely expressed

∀a, b : a ⊑ b =⇒ γV<sup>♯</sup> a ⊆ γV<sup>♯</sup> b γ<sup>R</sup> ⊥ = ∅ ∀r, s : r ⊑ s =⇒ γ<sup>R</sup> r ⊆ γ<sup>R</sup> s <sup>γ</sup><sup>R</sup> (J<sup>x</sup> <sup>←</sup> <sup>e</sup><sup>K</sup> ♯ <sup>R</sup> <sup>r</sup>) ⊇ {<sup>σ</sup> ⊕ {<sup>x</sup> 7→ <sup>J</sup>eKσ} | <sup>σ</sup> <sup>∈</sup> <sup>γ</sup>Rr} γR(r|<sup>Y</sup> ) ⊇ {σ ⊕ {x<sup>1</sup> 7→ v1, . . . , x<sup>m</sup> 7→ vm} | v<sup>i</sup> ∈ V, x<sup>i</sup> ∈ Vars \ Y, σ ∈ γRr} γ<sup>R</sup> (lift σ ♯ ) ⊇ {σ | ∀x : σ x ∈ γV<sup>♯</sup> (σ <sup>♯</sup> x)} γV<sup>♯</sup> (unlift r) x ⊇ {σ x | σ ∈ γ<sup>R</sup> r}

Fig. 1: Required properties for γV<sup>♯</sup> : V <sup>♯</sup> → 2 <sup>V</sup> and γ<sup>R</sup> : R → 2 <sup>V</sup>ars→V .

<sup>3</sup> We use σ ⊕ {x<sup>i</sup> 7→ v<sup>i</sup> | i = 1, . . . , m} to denote the variable assignment obtained from σ by replacing the values for x<sup>i</sup> with v<sup>i</sup> (i = 1, . . . , m).

as the meet of its restrictions to clusters of S<sup>k</sup> and when all least upper bounds can be recovered by computing with clusters of S<sup>k</sup> only; that is,

$$r = \square \left\{ r \vert\_{Q} \mid Q \in \mathcal{S}\_{k} \right\} \qquad \qquad (\bigsqcup R) \vert\_{Q} = \bigsqcup \left\{ r \vert\_{Q} \mid r \in R \right\} \quad (Q \in \mathcal{S}\_{k}) \tag{2}$$

holds for each abstract relation r ∈ R and each set of abstract relations R ⊆ R.

Example 3. The domain R<sup>1</sup> from the previous example is 2-decomposable. This also holds for the octagon domain [36] and many other weakly relational numeric domains (pentagons [33], weighted hexagons [21], logahedra [28], TVPI [53], dDBM [46], and AVO [11]). The afne equalities or afne inequalities domains [16, 30], however, are not. The relational string domains proposed by Arceri et al. [6, Sec. 5.1 - 5.3], are examples of non-numeric 2-decomposable domains.

# 3 A Local Trace Semantics

We build upon the semantic framework for local traces, introduced by Schwarz et al. [48]. A local trace records all past events that have afected the present confguration of a specifc thread, referred to as the ego thread. In [48], the local trace semantics is proven equivalent to the global trace semantics which itself is equivalent to a global interleaving semantics. In particular, any analysis that is sound w.r.t. the local trace semantics also is w.r.t. the interleaving semantics.

While the framework of Schwarz et al. [48] allows for diferent formalizations of traces, thread synchronization happens only via locking/unlocking and thread creation. Generalizing their semantics, we identify certain actions as observable by other threads when executing corresponding observing actions (see Table 1). When the ego thread executes an observing action, a local trace ending in the corresponding observable action is incorporated. Here, we consider as observable/observing actions locking/unlocking mutexes and creating/joining threads.

Consider, e.g., the program in Fig. 2a and a corresponding local trace (Fig. 2b). This trace consists of one swim lane for each thread representing the sequence of steps it executed where each node in the graph represents a confguration attained by it. Additionally, the trace records the create and join orders as well as for each mutex a, the locking order for a (→c, →<sup>j</sup> , and →a, respectively). These



Fig. 2: An example program and a corresponding local trace.

orders introduce extra relationships between thread confgurations. The unique start node of each local trace is an initial confguration of the main thread.

We distinguish between the sets X and G of local and global variables. We assume that X contains a special variable self within which the thread id of the current thread, drawn from the set I, is maintained. A (local) thread confguration is a pair (u, σ) where u is a program point and the type-consistent map σ : X → V provides values for the local variables. The values of globals are not explicitly represented in a thread confguration, but can be recovered by consulting the (unique) last write to this global within the local trace. To model weak memory efects, weaker notions of last writes are conceivable. As in [48], we consider a set of actions Act that consists of locking and unlocking a (non-reentrant) mutex from a set M, copying values of globals into locals and vice-versa, creating a new thread, as well as assignments with and branching on local variables. We extend Act with actions for returning from and joining with threads. We assume that writes to and reads from globals are atomic (or more precisely, we assume copying values of integral type to be atomic). This is enforced for each global g by a dedicated mutex m<sup>g</sup> acquired just before accessing g and released immediately after. For simplicity, we associate traces corresponding to a write of g to this dedicated mutex mg, and thus do not need to consider writing and reading of globals as observable/observing actions. In examples, we omit explicitly locking and unlocking these mutexes. By convention, at program start all globals have value 0, while local variables may initially have any value.

Each thread is represented by a control-fow graph with edges e ∈ E of the form e = (u, act, u′ ) for some action act ∈ Act and program points u and u ′ where the start point of the main thread is u0. Let T denote the set of all local traces of a given program. A formalism for local traces must, for each edge <sup>e</sup> of the control-fow graph, provide a transformation <sup>J</sup>e<sup>K</sup> : <sup>T</sup> <sup>k</sup> → 2 T so that <sup>J</sup>eK(t0, . . . , tk−1) extends the local trace <sup>t</sup>0, possibly incorporating other local traces. For the operations lock(a), a ∈ M, or x=join(x ′ ), x, x′ ∈ X , the arity of <sup>J</sup>e<sup>K</sup> is two, another local trace, namely, with last operation unlock(a) or return x ′′, respectively, is incorporated. The remaining edge transformations have arity one. In all cases, the set of resulting local traces may be empty when the operation is not applicable to its argument(s). We write <sup>J</sup>eK(T0, . . . , Tk−1) for the set S t0∈T0,...,tk−1∈Tk−<sup>1</sup> <sup>J</sup>eK(t0, . . . , tk−1).

Given defnitions of <sup>J</sup>eK, the set <sup>T</sup> can be inductively defned starting from a set init of initial local traces consisting of initial confgurations of the main thread. To develop efcient thread-modular abstractions, we are interested in subsets T [u], T [a], T [i] of local traces ending at some program point u, ending with an unlock operation for mutexes a (or from init), or ending with a return statement of thread i, respectively. Schwarz et al. [48] showed that such subsets can be described as the least solution of a side-efecting constraint system [5]. There, each right-hand side may, besides its contribution to the unknown on the left, also provide contributions to other unknowns (the side-efects). This allows expressing analyses that accumulate fow-insensitive information about globals during a fow-sensitive analysis of local states with dynamic control fow [51]. Here, in the presence of dynamic thread creation, we use side-efects to express that an observable action, unlock or return, should also contribute to the sets T [a] or T [i], such that they can be incorporated at the corresponding observing action. The side-efecting formulation of our concrete semantics takes the form:

$$(\eta, \eta \left[ u\_0 \right]) \sqsupseteq (\{ \left[ a \right] \mapsto \mathsf{init} \mid a \in \mathsf{M} \}, \mathsf{init}) \quad (\eta, \eta \left[ u' \right]) \sqsupseteq \left[ u, \mathsf{act} \right] \eta \nmid (u, \mathsf{act}, u') \in \mathcal{E} \quad (3)$$

where the ordering ⊒ is induced by the superset ordering and right-hand sides are defned in Fig. 3. A right-hand side takes an assignment η of the unknowns of the system and returns a pair (η ′ , T) where T is the contribution to the unknown occurring on the left (as in ordinary constraint systems). The frst component collects the side-efects as the assignment η ′ . If the right-hand sides are monotonic, Eq. (3) has a unique least solution.

We only detail the right-hand sides for the creation of threads as well as the new actions join and return; the rest remain the same as defned by Schwarz et al. [48]. For thread creation, they provide the action x=create(u1). Here, u<sup>1</sup> is the program point at which the created thread should start. We assume that all locals from the creator are passed to the created thread, except for the


Fig. 3: Right-hand sides for side-efecting formulation of concrete semantics; t(y) extracts the value of local variable y from the terminal confguration of trace t. variable self. The variables self in the created thread and x in the creating thread receive a fresh thread id. Here, newu u<sup>1</sup> t computes the local trace at the start point u<sup>1</sup> from the local trace t of the creating thread. To handle returning and joining of threads we introduce the following two actions:


For returning results and realization of join, we employ the unknown [i] for the thread id i of the returning thread, as shown in Fig. 3.

# 4 Relational Analyses as Abstractions of Local Traces

Subsequently, we give relational analyses of the values of globals which we base on the local trace semantics. They are generic in the relational domain R, with 2-decomposable domains being particularly well-suited, as the concept of clusters is central to the analyses. We focus on relations between globals that are jointly write-protected by some mutex. We assume we are given for each global g, a set M[g] of (write) protecting mutexes, i.e., mutexes that are always held when g is written. Let G[a] = {g ∈ G | a ∈ M[g]} denote the set of globals protected by a mutex a. Let ∅ ̸= Q<sup>a</sup> ⊆ 2 G[a] the set of clusters of these globals we associate with a. For technical reasons, we require at least one cluster per mutex a, which may be the empty cluster ∅, thus not associating any information with a.

Our basic idea is to store at the unknown [a, Q] (for each mutex a and cluster Q ∈ Qa) an abstraction of the relations only between globals in Q. By construction, all globals in Q are protected by a. Whenever it is locked, the relational information stored at all [a, Q] is incorporated into the local state by the lattice operation meet, i.e., the local state now maintains relations between locals as well as globals which no other thread can access at this program point. Whenever a is unlocked, the new relation between globals in all corresponding clusters Q ∈ Q<sup>a</sup> is side-efected to the respective unknowns [a, Q]. Simultaneously, all information on globals no longer protected, is forgotten to obtain the new local state. In this way, the analysis is fully relational in the local state, while only keeping relations within clusters of globals jointly protected by some mutex.

For clarity of presentation, we perform control-point splitting on the set of held mutexes when reaching program points. Apart from this, the constraint system and right-hand sides for the analysis closely follow those of the concrete semantics (Section 3) — with the exception that unknowns now take values from R and that unknowns [a] are replaced with unknowns [a, Q] for Q ∈ Qa.

All right-hand sides are given in detail in Fig. 4. For the start point of the program and the empty lockset, the right-hand side init<sup>♯</sup> returns the ⊤ relation updated such that the variable self holds the abstract thread id i<sup>0</sup> of the main


Fig. 4: Right-hand sides for the basic analysis. All functions are strict in ⊥ (describing the empty set of local traces), we only display defnitions for non-⊥ abstract values here. <sup>J</sup>{<sup>g</sup> <sup>←</sup> <sup>0</sup> <sup>|</sup> <sup>g</sup> <sup>∈</sup> <sup>Q</sup>}<sup>K</sup> ♯ <sup>R</sup> is shorthand for the abstract transformer corresponding to the assignment of 0 to all variables in Q one-by-one.

thread. Additionally, init<sup>♯</sup> produces a side-efect for each mutex a and cluster Q that initializes all globals from the cluster with the value 0.

For a thread creating edge starting in program point u with lockset S, the right-hand side <sup>J</sup>[u, S], x=create(u1)<sup>K</sup> <sup>♯</sup> frst generates a new abstract thread id, which we assume can be computed using function ν ♯ . The new id is assigned to the variable x in the local state of the current thread. Additionally, the start state r ′ for the newly created thread is constructed and side-efected to the thread's start point with empty lockset [u1, ∅]. Since threads start with empty lockset, the state r ′ is obtained by removing all information about globals from the local state of the creator and assigning the new abstract thread id to the variable self.

When locking a mutex a, the states stored at unknowns [a, Q] with Q ∈ Q<sup>a</sup> are combined with the local state by meet. This is sound because the value stored at any [a, Q] only maintains relationships between variables write-protected by a, and these values soundly account for the program state at every unlock(a) and at program start. When unlocking a, on the other hand, the local state restricted to the appropriate clusters Q ∈ Q<sup>a</sup> is side-efected to the respective unknowns [a, Q], so that the changes made to variables in the cluster become visible to other threads. Also, the local state is restricted to the local variables and only those globals for which at least one protecting mutex is still held.

As special mutexes m<sup>g</sup> immediately surrounding accesses to g are used to ensure atomicity, and information about g is associated with them, all reads and writes refer to the local copy of g. Guards and assignments (which may only involve local variables) are defned analogously. For a return edge, the abstract value to be returned is looked up in the local state and then side-efected to the abstract thread id of the current thread (as the value of the dedicated variable ret). For join, the least upper bound of all return values of all possibly joined threads is assigned to the left-hand side of the join statement in the local state.

Example 4. Consider the program<sup>4</sup> where M[g] = {a, b, mg}, M[h] = {a, b, mh}, Q<sup>a</sup> = {{g, h}}, Q<sup>b</sup> = {{g, h}}.

```
main :
 x = create(t1); y = ?;
 lock(a); lock(b);
 g = y; h = y+9;
 unlock(b); lock(b);
 h = y;
 // ASSERT(g==y); (1)
 // ASSERT(h==y); (2)
 unlock(b); unlock(a);
 x = create(t2);
                          t1 :
                            lock(b);
                            unlock(b);
                            lock(a);
                            lock(b);
                            // ASSERT(g==h); (3)
                            y = ?; g = y; h = y;
                            unlock(b);
                            unlock(a);
                                                    t2 :
                                                     lock(b);
                                                     lock(a);
                                                     // ASSERT(g==h); (4)
                                                     unlock(a);
                                                     unlock(b);
```
Our analysis succeeds in proving all assertions here. Thread t<sup>2</sup> is of particular interest: When locking b only g ≤ h is known to hold, and locking the additional mutex a means that the better information g = h becomes available. The analysis by Mukherjee et al. [42] on the other hand only succeeds in proving assertion (2) — even when all globals are put in the same region. It cannot establish (1) because all correlations between locals and globals are forgotten when the mix operation is applied at the second lock(b) in the main thread. (3) cannot be established because, at lock(b) in t1, the mix operation also incorporates the state after the frst unlock(b) in the main thread, where g = h does not hold. Similarly, for (4). The same applies for assertion (3) and the analysis using lock invariants proposed by Miné [39]. This analysis also falls short of showing (1), as at the lock(b) in the main thread, the lock invariant associated with b is joined into the local state. (4) is similarly out of reach. The same reasoning also applies to [39, 42, 48] after equipping the analyses with thread ids. ⊓⊔

Theorem 1. Any solution of the constraint system is sound w.r.t. the local trace semantics.

Proof. The proof is by fxpoint induction, the details are given in Appendix B of the extended version [49] of this paper.

We remark that, instead of relying on M[g] being pre-computed, an analysis can also infer this information on the fy [58].

The analysis however still has some defciencies. All writes to a global are accumulated regardless of the writing thread. As a consequence, a thread does, e.g., not only read its latest local writes but also all earlier local writes, even if

<sup>4</sup> In all examples, g, h, and i are globals, whereas x, y, and z are locals, and the clusters at special mutexes m<sup>g</sup> contain only g: Q<sup>m</sup><sup>g</sup> = {{g}}. Unless explicitly stated otherwise, domain R<sup>1</sup> from Example 2, enhanced with variable inequalities is used.

those are defnitely overwritten. Excluding some threads' writes is an instance of the more general idea of excluding writes that cannot be last writes. Instead of giving ad hoc remedies for this specifc shortcoming, we propose a general mechanism to improve the precision of any thread-modular analysis in the next section, and later instantiate it to the issue highlighted here.

# 5 Refnement via Finite Abstractions of Local Traces

To improve precision of thread-modular analyses we take additional abstractions of local traces into account. Our approach is generic, building on the right-hand sides of a base analysis and using them as black boxes. We will instantiate this framework to exclude writes based on thread ids from the analysis in Section 4. Other instantiations are conceivable as well. To make it widely applicable, the framework allows base analyses that already perform some splitting of unknowns at program points (e.g., locksets in Section 4). We denote by [ˆu] such (possibly) extended unknowns for a program point u. A (base) analysis is defned by its right-hand sides, and a collection of domains: (1) D<sup>S</sup> for abstract values stored at unknowns for program points; (2) Dact for abstract values stored at observable actions act (e.g., in Section 4, D<sup>M</sup> for unlocks and D<sup>T</sup> for thread returns).

Let A be a set of fnite information that can be extracted from a local trace by a function αA:T →A. We call α<sup>A</sup> t∈A the digest of some local trace t. Let <sup>J</sup>u, act<sup>K</sup> ♯ <sup>A</sup>:Ak→2 <sup>A</sup> be the efect on the digest when performing a k-ary action act ∈ Act for a control fow edge originating at u. We require for e=(u, act, v)∈E,

$$\begin{aligned} \forall A\_0, \ldots, A\_{k-1} \in \mathcal{A}: & \left| \left[ u, \mathbf{act} \right] \right| \_{\mathcal{A}}^{\sharp} (A\_0, \ldots, A\_{k-1}) | \leq 1 \\ \forall t\_0, \ldots, t\_{k-1} \in \mathcal{T} &: \alpha\_{\mathcal{A}}(\left[ e \right] (t\_0, \ldots, t\_{k-1})) \subseteq \left[ u, \mathbf{act} \right] \big| \_{\mathcal{A}}^{\sharp} (\alpha\_{\mathcal{A}} \, t\_0, \ldots, \alpha\_{\mathcal{A}} \, t\_{k-1}) \end{aligned} (4)$$

where α<sup>A</sup> is lifted element-wise to sets. While the frst restriction ensures determinism, the second intuitively ensures that <sup>J</sup>u, act<sup>K</sup> ♯ <sup>A</sup> soundly abstracts <sup>J</sup>eK.

For thread creation, we additionally require a helper function new ♯ <sup>A</sup> : N → N → A → A that returns for a thread created at an edge originating from u and starting execution at program point u<sup>1</sup> the new digest. The same requirements are imposed for edges (u, x=create(u1), v) ∈ E,

$$\forall A\_0 \in \mathcal{A}: \left| \mathsf{new}\_{\mathcal{A}}^\sharp \, u \, u\_1 \, A\_0 \right| \le 1 \quad \forall t\_0 \in \mathcal{T}: \alpha\_{\mathcal{A}}(\mathsf{new} \, u \, u\_1 \, t) \subseteq \mathsf{new}\_{\mathcal{A}}^\sharp \, u \, u\_1 \, (\alpha\_{\mathcal{A}} \, t\_0) \tag{5}$$

Also, we defne for the initial digest at the start of the program

$$\mathsf{init}\_{\mathcal{A}}^{\sharp} = \{ \alpha\_A \, t \mid t \in \mathsf{init} \} \tag{6}$$

Under these assumptions, we can perform control-point splitting according to A. This means that unknowns [ˆu] for program points u are replaced with new unknowns [ˆu, A], A ∈ A. Analogously, unknowns for observable actions [act] are replaced with unknowns [act, A] for A ∈ A. Consider a single constraint from an abstract constraint system of the last section, which soundly abstracts the collecting local trace semantics of a program.

$$(\eta, \eta \left[ \hat{v} \right]) \sqsupseteq \left[ \left[ \hat{u} \right], \mathbf{act} \right] \mathfrak{l} \ \eta$$


Fig. 5: Right-hand sides for an observing action act, an observable action act′ , a create action, and an action act′′ that is neither for the refned analyses, defned as wrappers around the right-hand sides of a base analysis.

The corresponding constraints of the refned system with control-point splitting difer based on whether the action act is observing, observable, or neither.

– When act is observing, the new right-hand side additionally gets the digest A<sup>1</sup> associated with the local traces that are to be incorporated:

$$(\eta, \eta \left[\hat{v}, A'\right]) \sqsupseteq \mathbb{I}[[\hat{u}, A\_0], \mathfrak{act}, A\_1]^\sharp \eta \qquad \text{for } A\_0, A\_1 \in \mathcal{A}, A' \in \left[\mathbb{I}u, \mathfrak{act}\right]\_{\mathcal{A}}^\sharp (A\_0, A\_1)$$

– When act is observable, the digest A′ of the resulting local trace is passed, so the side-efect can be redirected to the appropriate unknown:

$$\mathbb{E}\left(\eta,\eta\left[\hat{v},A'\right]\right) \supseteq \mathbb{E}\left[\left[\hat{u},A\_0\right],\mathsf{act},A'\right]^\sharp \eta \qquad \text{for } A\_0 \in \mathcal{A}, A' \in \left[u,\mathsf{act}\right]^\sharp\_{\mathcal{A}}\left(A\_0\right)$$

– When act is neither, no additional digest is passed:

$$\mathbb{E}\left(\eta,\eta\left[\hat{v},A'\right]\right) \supseteq \mathbb{E}\left[\left[\hat{u},A\_0\right],\mathsf{act}\right] \mathbb{I}^\sharp \eta \qquad\qquad\text{for } A\_0 \in \mathcal{A}, A' \in \left[\mathbb{I}u,\mathsf{act}\right]\_{\mathcal{A}}^\sharp \left(A\_0\right)$$

The new right-hand sides are defned in terms of the right-hand side of the base analysis which are used as black boxes (Fig. 5). They act as wrappers, mapping any unknown consulted or side-efected to by the original analysis to the appropriate unknown of the refned system. Thus, the refned analysis automatically benefts from the extra information the digests provide. It may, e.g., exploit that <sup>J</sup>u, act<sup>K</sup> ♯ <sup>A</sup>(A0, A1) = ∅ meaning that, no local traces with digests A0, A<sup>1</sup> can be combined into a valid local trace ending with action act. The complete defnition of the refned constraint system instantiated to the actions considered here and unknowns for program points enriched with locksets is given in [49, Fig. 14].

Enriching program points with locksets can in fact be seen as a frst application of this framework. The right-hand sides are given in Fig. 6.

Example 5. As a further instance, consider tracking which mutexes have been locked at least once in the local trace. At lock(a) traces in which a thread has performed a lock(a) can not be combined with traces that contain no lock(a). The corresponding right-hand sides are given in Fig. 7. When refning the analysis from Section 4 accordingly (assuming a protects g and h), it succeeds in proving the assert in this program as the initial values of 0 for g and h can be excluded.


This naturally generalizes to counting how often some action (e.g., a write to a global g) occurred, stopping exact bookkeeping at a constant (1 in this case). ⊓⊔

To prove soundness of local-trace-based refnement of our analysis from Section 4, we frst construct a corresponding refned collecting local trace semantics. Then we verify that the refned analysis is sound w.r.t. this refned semantics – which, in turn, is proven sound w.r.t. the original collecting local trace semantics.

Theorem 2. Assume that αA, new ♯ <sup>A</sup>, and <sup>J</sup>u, act<sup>K</sup> ♯ <sup>A</sup> fulfll requirements (4), (5), and (6). Then any solution of the refned constraint system is sound relative to the collecting local trace semantics.

Proof. A proof sketch instantiated with the actions considered here and unknowns enriched with locksets is provided in [49, Appendix D].

# 6 Analysis of Thread Ids and Uniqueness

We instantiate the scheme from the previous section to compute abstract thread ids and their uniqueness. That refnement of the base analysis enhances precision of the analysis by excluding reads, e.g., from threads that have not yet been started. For that, we identify threads by their thread creation history, i.e., by sequences of create edges. As these sequences may grow arbitrarily, we collect all creates occurring after the frst repetition into a set to obtain fnite abstractions.

Example 6. In the program from Fig. 8, the frst thread created by main receives the abstract thread id (main· ⟨u1, t1⟩, ∅). It creates a thread with abstract thread id (main · ⟨u1, t1⟩ · ⟨u3, t1⟩, ∅). At program point u3, the latter creates a thread starting at t<sup>1</sup> and receiving the abstract thread id (main · ⟨u1, t1⟩, {⟨u3, t1⟩}) – as do all threads subsequently created at this edge. ⊓⊔

init<sup>♯</sup> <sup>A</sup> = {∅} new ♯ <sup>A</sup> u u<sup>1</sup> S = {∅} <sup>J</sup>u, a<sup>K</sup> ♯ <sup>A</sup> S = {S} (other non-observing) <sup>J</sup>u, lock(a)<sup>K</sup> ♯ <sup>A</sup> (S, S′ ) = {S ∪ {a}} <sup>J</sup>u, unlock(a)<sup>K</sup> ♯ <sup>A</sup> S = {S \ {a}} <sup>J</sup>u, a<sup>K</sup> ♯ <sup>A</sup> (S, S′ ) = {S} (other observing)

Fig. 6: Right-hand sides for expressing locksets as a refnement.

Create edges, however, may also be repeatedly encountered within the creating thread, in a loop. To deal with this, we track for each thread, the set C of possibly already encountered create edges. As soon as a create edge is encountered again, the created thread receives a non-unique thread id.

Example 7. The frst time the main thread reaches program point u<sup>2</sup> in the program from Fig. 8, the created thread is assigned the unique abstract thread id (main · ⟨u2, t1⟩, ∅). In subsequent loop iterations, the created threads are no longer kept separate, and thus receive the non-unique id (main, {⟨u2, t1⟩}). ⊓⊔

Formally, let N<sup>C</sup> , N<sup>S</sup> denote the subsets of program points with outgoing edge labeled x=create(...), and of starting points of threads, respectively. Let P ⊆ N<sup>C</sup> × N<sup>S</sup> denote sets of pairs relating thread creation nodes with the starting points of the created threads. The set I <sup>♯</sup> of abstract thread ids then consists of all pairs (i, s) ∈ (main·P<sup>∗</sup> )×2 <sup>P</sup> in which each pair ⟨u, f⟩ occurs at most once. Given the set I ♯ , we require that there is a concretization γ : I <sup>♯</sup> → 2 <sup>I</sup> and a function single : I <sup>♯</sup> → V<sup>♯</sup> <sup>I</sup> with γ i<sup>♯</sup> ⊆ γV<sup>♯</sup> (single i ♯ ). The abstract thread id of the main thread is given by (main, ∅). Therein, the elements in (main · P<sup>∗</sup> ) × {∅} represent the unique thread ids representing at most one concrete thread id, while the elements (i, s), s ̸= ∅, are ambiguous, i.e., may represent multiple concrete thread ids. Moreover, we maintain the understanding that the concretizations of distinct abstract thread ids from I <sup>♯</sup> all are disjoint.

As refning information A we consider not only abstract thread ids – but additionally track sets of executed thread creations within the current thread. Accordingly, we set A = I <sup>♯</sup>×2 <sup>P</sup> and defne the right-hand sides as seen in Fig. 9, where ¯i denotes the set of pairs occurring in the sequence i.

Example 8. Consider again the program from Fig. 8 with right-hand sides from Fig. 9, and assume that the missing right-hand for join returns its frst argument. The initial thread has the abstract thread id i<sup>0</sup> = (main, ∅). At its start point, the digest thus is (i0, ∅). At the create edge originating at u1, a new thread with id (main · ⟨u1, t1⟩, ∅) is created. The digest for this thread then is ((main · ⟨u1, t1⟩, ∅), ∅). For the main thread, the encountered create edge ⟨u1, t1⟩ is added to the second component of the digest, making it (i0, {⟨u1, t1⟩}).

When u<sup>2</sup> is reached with (i0, {⟨u1, t1⟩}), a unique thread with id (main · ⟨u2, t1⟩, ∅) is created. The new digest of the creating thread then is (i0, {⟨u1, t1⟩, ⟨u2, t1⟩}). In subsequent iterations of the loop, for which u<sup>2</sup> is reached with (i0, {⟨u1, t1⟩,⟨u2, t1⟩}), a non-unique thread with id (main, {⟨u2, t1⟩}) is created.


Fig. 7: Right-hand sides for refning according to encountered lock operations.

When reaching u<sup>3</sup> with id (main, {⟨u2, t1⟩}), a thread with id (main, {⟨u2, t1⟩, ⟨u3, t1⟩}) is created as the id of the creating thread was already not unique. When reaching it with the id (main · ⟨u1, t1⟩, ∅), a new thread with id (main · ⟨u1, t1⟩ · ⟨u3, t1⟩, ∅) is created. When the newly created thread reaches this program point, the threads created there have the non-unique id (main · ⟨u1, t1⟩, {⟨u3, t1⟩}), as ⟨u3, t1⟩ already appears in the id of the creating thread. ⊓⊔

Abstract thread ids should provide us with functions


For our domain I ♯ , these can be defned as unique (i, s) = (s = ∅) and

$$\begin{array}{l} \mathsf{lcu\\_acc}\left(i,s\right)\left(i',s'\right) &= \left(\mathsf{longest\,\mathsf{common\\_prefix}}\left(i\,i',\emptyset\right)\right)\\ \mathsf{may\\_create}\left(i,s\right)\left(i',s'\right) &= \left(\bar{i}\cup s\right)\subseteq\left(\bar{i'}\cup s'\right) \end{array}$$

We use this extra information to enhance the defnitions of <sup>J</sup>u, lock(a)<sup>K</sup> ♯ <sup>A</sup> and <sup>J</sup>u, x′=join(x)<sup>K</sup> ♯ <sup>A</sup> to take into account that the ego thread cannot acquire a mutex from another thread or join a thread that has defnitely not yet been created. This is the case for a thread t ′


Accordingly, we introduce the predicate may\_run (i, C) (i ′ , C′ ) defned as

$$(\mathsf{lcu\\_anc}\,i\,i'=i) \implies \exists\langle u,u'\rangle \in C: (i\circ\langle u,u'\rangle = i'\lor\mathsf{may\\_create}\,(i\circ\langle u,u'\rangle)\,i')$$

which is false whenever thread i ′ is defnitely not yet started. We then set

$$\begin{aligned} \left\| \left[ u, \mathsf{lock}(a) \right] \right\|\_{\mathcal{A}}^{\sharp} \left( i, C \right) \left( i', C' \right) &= \left\| u, x' = \mathsf{join}(x) \right\|\_{\mathcal{A}}^{\sharp} \left( i, C \right) \left( i', C' \right) \\ &= \begin{cases} \left\{ \left( i, C \right) \right\} & \text{if } \mathsf{mark\\_run} \left( i, C \right) \left( i', C' \right) \\ \emptyset & \text{otherwise} \end{cases} \end{aligned}$$

This analysis of thread ids and uniqueness can be considered as a May-Happen-In-Parallel (or, more precisely, Must-Not-Happen-In-Parallel) analysis. MHP

```
main :
 x = g; // PP u1
 y = create(t1);
 for(i = 0; i < 5; i++) { // PP u2
   z = create(t1); }
                                       t1 :
                                        g = 42; // PP u3
                                        y = create(t1);
```
Fig. 8: Program with multiple thread creations.

information is useful in a variety of scenarios: a thread-modular analysis of data races or deadlocks, e.g., that does not consider thread ids and joining, can be refned with this analysis to exclude more data races or deadlocks. Subsequently, we outline how the analysis from Section 4 may beneft from MHP information.

# 7 Exploiting Thread IDs to Improve Relational Analyses

We subsequently exploit abstract thread ids and their uniqueness to limit the amount of reading performed by the analysis from Section 4.


Improvements I1 and I3 have, e.g., been realized in a setting where thread ids and which thread is joined where can be read of from control-fow graphs [31]. Here, however, this information is computed during analysis. In our framework, I1 is already achieved by refning the base analysis according to Section 6.

Example 9. Consider the program below where M[g] = {a, b, mg}, M[h] = {a, b, mh}, M[i] = {mi} and assume Q<sup>a</sup> = {{g, h}}.

```
main :
 x = create(t1); lock(a);
 // ASSERT(g==h); (1)
 unlock(a);
 y = create(t2); lock(a);
 // ASSERT(g==h); (2)
 g = 42; h = 42;
 unlock(a); z = create(t3);
 i = 3; i = 2; // ASSERT(i==2); (3)
 i = 8;
                                          t1 :
                                           lock(a);
                                           r = ?; g = r; h = r;
                                           unlock(a);
                                          t2 :
                                           lock(a); v = g; unlock(a);
                                          t3 :
                                           lock(a); g = 19; unlock(a);
```
The analysis succeeds in proving (1), as the thread (starting at) t<sup>3</sup> that breaks the invariant g=h has defnitely not been started yet at this program point. Without refnement, the analysis from Section 4 could not prove (1). However, this does

```
init♯
   A = {((main, ∅), ∅)}
Ju, x=create(u1)K
                   ♯
                   A (i, C) = {(i, C ∪ {⟨u, u1⟩})}
Ju, aK
      ♯
      A (i, C) = {(i, C)} (for other actions a)
new
    ♯
    A u u1 ((d, s), C) =
  let (d
         ′
         , s′
             ) = (d, s) ◦ ⟨u, u1⟩ in
  if s
      ′ = ∅ ∧ ⟨u, u1⟩ ∈ C then ((d, {⟨u, u1⟩}), ∅)
  else ((d
           ′
            , s′
               ), ∅)
                                                       (d, s) ◦ ⟨u, u1⟩ =
                                                          if d = (d0 · ⟨u, u1⟩) · d1 then
                                                             (d0, s ∪ d¯1 ∪ {⟨u, u1⟩})
                                                          else if s = ∅ then (d · ⟨u, u1⟩, ∅)
                                                          else (d, s ∪ {⟨u, u1⟩})
```
Fig. 9: Right-hand sides for thread ids.

not sufce to prove (2). At this program point, t<sup>2</sup> may already be started. At the lock(a) in t2, t<sup>3</sup> may also be started; thus, the violation of the invariant g=h by t<sup>3</sup> is incorporated into the local state of t<sup>2</sup> at lock. At unlock(a), despite t<sup>2</sup> only reading g, the imprecise abstract relation violating g=h, is side-efected to [a, {g, h}, t2] and is incorporated at the second lock(a) of the main thread. The fnal shortcoming is that each thread reads all its own past (and future!) writes – even when it is known to be unique. This means that (3) cannot be proven. ⊓⊔

To achieve I2, some efort is required as our analysis forgets values of globals when they become unprotected. This is in contrast, e.g., to [39, 42]. We thus restrict side-efecting to mutexes to cases where the ego thread has possibly written a protected global since acquiring it. This is in contrast to Section 4, where a side-efect is performed at every unlock, i.e., everything a thread reads is treated as if it was written by that thread.

Technically, we locally track a map L : (M × Q) → R, where L(a, Q) maintains for a mutex a, an abstract relation between the globals in cluster Q ∈ Qa. More specifcally, the abstract relation on the globals from Q recorded in L(a, Q) is the one that held when a was unlocked join-locally for the frst time after the last join-local write to a global in G [a]. If there is no such unlock(a), the relation at program start is recorded. We call an operation in a local trace join-local to the ego thread, if it is (a) thread-local, i.e., performed by the ego thread, or (b) is executed by a thread that is (transitively) joined into the ego thread, or (c) is join-local to the parent thread at the node at which the ego thread is created. This notion will also be crucial for realizing I3. Join-locality is illustrated in Fig. 10, where the join-local part of a local trace is highlighted.

For join-local contributions, it sufces to consult L a instead of unknowns [a, Q, i]. Such contributions are accounted for. To check whether a contribution from some thread id is accounted for, we introduce a function acc : (A × DS)→A→bool (see defnition (7) below). Besides an abstract value from R, the local state D<sup>S</sup> now contains two additional components:


Just like r, L and W are abstractions of the reaching local traces. D<sup>T</sup> is also enhanced with an L component, while D<sup>M</sup> remains unmodifed. We sketch the right-hand sides here, defnitions are given in Fig. 11. For program start init<sup>♯</sup> , in contrast to the analysis from Section 4, there is no initial side-efect to the unknowns for mutexes. The initial values of globals are join-local, and thus accounted for in the L component also passed to any subsequently created thread.

The right-hand sides for thread creation and return difer from the analysis from Section 4 enhanced with thread ids only in the handling of additional data structures L and W. As the thread ids are tracked precisely in the A component, this information is directly used when determining which unknown to side-efect to and unknowns [(i, C)] replace unknowns [i ′ ,(i, C)].

For join, if the return value of the thread is not accounted for, it is assigned to the variable on the left-hand side and the L information from the ego thread and the joined thread is joined. If, on the other hand, it is accounted for, the thread has already been joined and cannot be joined again. There is a separate constraint for each (i ′ , C′ ), so that all threads that could be joined are considered.

For locking of mutexes, upon lock, if (i ′ , C′ ) is not accounted for, its information on the globals protected by a is joined with the join-local information for a maintained in L(a, Q), Q ∈ Qa. This information about the globals protected by a is then incorporated into the local state by ⊓. For unlocking of mutexes, if there may have been a write to a protected global since the mutex was locked (according to W), the join-local information is updated and the local state restricted to Q is side-efected to the appropriate unknown [a, Q,(i, C)] for Q ∈ Qa. Just like in Section 4, r is then restricted to only maintain relationships between locals and those globals for which at least one protecting mutex is still held. Reading from and writing to globals once more are purely local operations. To exclude self writes, we set

$$\mathbf{acc}\left( (i,C),\\_\right)(i',C') = \mathbf{unique}\ i \wedge i = i'\tag{7}$$

The resulting analysis thus takes I1 (via <sup>J</sup>...<sup>K</sup> ♯ <sup>A</sup> defned in Section 6), as well as I2 (via acc) into account. In Example 9, it is now able to show all assertions.

Theorem 3. This analysis is sound w.r.t. to the local trace semantics.

Proof. The proof relies on the following observations:


The detailed proof is a simplifcation of a proof for an enhanced analysis from the extended version [49, Appendix F], which we outline in Appendix G there. ⊓⊔

The analysis does not make use of components C at unknowns [a, Q,(i, C)] and [i, C]. In [49, Appendix E], we detail how this information can be exploited to exclude a further class of writes – namely, those that are performed by an ancestor of the ego thread before the ego thread was created. Alternatively, an implementation may abandon control-point splitting according to C at mutexes and thread ids, replacing [a, Q,(i, C)], [i, C] with [a, Q, i] and [i], respectively.

Fig. 10: Illustration highlighting the join-local part of a local trace of the program from Fig. 2a, and which writes are thus accounted for by L.


Fig. 11: Right-hand sides for the improved (I1, I2) analysis using thread ids.

When turning to improvement I3, we observe that after joining a thread t with a unique thread id, t cannot perform further writes. As all writes of joined threads are join-local to the ego thread, it is not necessary to read from the corresponding global unknowns. We therefore enhance the analysis to also track in the local state, the set J of thread ids for which join has defnitely been called in the join-local part of the local trace and refne acc to take J into account:

$$\mathsf{acc}\left( (i,C), (J,L,W,r) \right) \left( i',C' \right) = \mathsf{unique}\, i' \land \left( i = i' \lor i' \in J \right)$$

The extended version [49, Appendix F] gives details on this enhancement.

# 8 Exploiting Clustered Relational Domains

Naïvely, one might assume that tracking relations among a larger set of globals is necessarily more precise than between smaller sets. Interestingly, this is no longer true for our analyses, e.g., in presence of thread ids. A similar efect where relating more globals can deteriorate precision has also been observed in the context of an analysis using a data-fow graph to model interferences [19].

Example 10. Consider again Example 1 in the introduction with Q<sup>a</sup> = {{g, h, i}}. For this program, the constraint system of the analysis has a unique least solution. It verifes that assertion (1) holds. It assures for [a, {g, h, i}, t1] that h=i holds, while for the main thread and the program point before each assertion, L(a, {g, h, i}) = {g=h, h=i} holds, while for [a, {g, h, i}, main] and [a, {g, h, i}, t2] only ⊤ is recorded, as is for any relation associated with mg, mh, or m<sup>i</sup> . Assertion (2), however, will not succeed, as the side-efect from t<sup>1</sup> causes the older values from the frst write in the main thread to be propagated to the assertions as well, implying that while h=i is proven, g=h is not. ⊓⊔

Intuitively, the analysis loses precision because, at an unlock of mutex a, the current relationships between all clusters protected by a are side-efected. As soon as one global is written to, the analysis behaves as if all protected globals had been written. By limiting publishing to those clusters for which at least one global has been written, more precise information may remain at others.

In the improved analysis, when unlocking a mutex a, side-efects are only produced to clusters Q ∈ Q<sup>a</sup> containing at least one global that was written to since the last lock(a). Defnitions for locking and unlocking are given in Fig. 12.

For locking the mutex a, the abstract value to be incorporated into the local state is assembled from the contributions of diferent threads to the clusters. For that, the separate constraints for each admitted digest from Section 5 are combined into one for the set I = {(i ′ , C′ ) <sup>|</sup> (i, C) <sup>∈</sup> <sup>J</sup>lock(a)<sup>K</sup> ♯ <sup>A</sup>((i, C),(i ′ , C′ ))} of all admitted digests. This is necessary as side-efects to unafected clusters at unlock(a) have been abandoned and thus the meet with the values for clusters of one thread at a time is unsound. For each Q, the join-local information L(a, Q) is joined with all contributions to Q by threads that are not yet accounted for, but admitted for Q by the digests. Here, the contributions of threads that do not write Q is ⊥, and thus do not afect the value for Q. Finally, the resulting value is used to improve the local state by meet. The right-hand side for lock(a) thus exploits the fne-grained, per-cluster MHP information provided by the digests and the predicate acc. We obtain:

Theorem 4. Given domains R and V ♯ fulflling the requirements from Fig. 1, any solution of the constraint system is sound w.r.t. the local trace semantics. Maximum precision is obtained with Q<sup>a</sup> = 2<sup>G</sup>[a] . ⊓⊔

For Example 1, with Q<sup>a</sup> = 2<sup>G</sup>[a] , both assertions are verifed. Performing the analysis with all subclusters simultaneously can be expensive when sets G[a] are large. The choice of subclustering thus generally involves a trade-of between precision and runtime. This is diferent for k-decomposable relational domains:

Theorem 5. Provided the relational domain is k-decomposable (Equation (2)), the clustered analysis using all subclusters of sizes at most k only, is equally precise as the clustered analysis using all subclusters Q<sup>a</sup> = 2<sup>G</sup>[a] at mutexes a.

Proof. Consider a solution η of the constraint system with Q<sup>a</sup> = 2<sup>G</sup>[a] . Then for unknowns [a, Q,(i, C)] and [a, Q′ ,(i, C)] with Q ⊆ Q′ and |Q| ≤ k, and values r=η [a, Q,(i, C)], r ′=η [a, Q′ ,(i, C)], we have that r ⊑ r ′ |<sup>Q</sup> (whenever the smaller <sup>J</sup>[u, S,(i, C)], unlock(a),(i, C)<sup>K</sup> ♯ η = let (L, W, r) = η [u, S,(i, C)] in let Q ′ = {Q | Q ∈ Qa, Q ∩ W ̸= ∅} in let L ′ = L ⊕ {(a, Q) 7→ r|<sup>Q</sup> | Q ∈ Q′ } in let ρ = {[a, Q,(i, C)] 7→ r|<sup>Q</sup> | Q ∈ Q′ } in let r ′ = r|X ∪<sup>S</sup> {G[a′ ]|a′∈(S\a)} in let W′ = {W | g ∈ W,M[g] ∩ S \ {a} ̸= ∅} in (ρ,(L ′ , W′ , r′′)) <sup>J</sup>[u, S,(i, C)], lock(a), <sup>I</sup><sup>K</sup> ♯ η = let (L, W, r) = η [u, S,(i, C)] in let l = ((i, C),(L, W, r)) in let J(Q) = F {η [a, Q,(i ′ , C′ )] | (i ′ , C′ ) ∈ I, ¬acc l(i ′ , C′ )} in let r ′ = d Q∈Qa (J(Q) ⊔ L(a, Q)) in (∅,(L, W, r ⊓ r ′ ))

Fig. 12: Right-hand sides for unlocking and locking when limiting side-efecting to potentially written clusters.

cluster receives a side-efect, so does the larger one). Thus, by k-decomposability, the additional larger clusters Q′ , do not improve the meet over the clusters of size at most k for individual thread ids as well as the meet of their joins over all thread ids. The same also applies to the clustered information stored in L. ⊓⊔

Example 11. Consider again Example 1. If the analysis is performed with clusters Q<sup>a</sup> = {{h, i}, {g, h}, {g, i}, {g}, {i}, {h}} both assertions can be proven. ⊓⊔

The one element clusters, on the other hand, cannot be abandoned – as indicated by the example from Appendix H in the extended version [49].

# 9 Experimental Evaluation

We implemented [50] the analyses extending the context-sensitive static analyzer Goblint which provides the set of protecting mutexes for each global. The implementation tracks information about integral variables using either the Interval or the Octagon domains from Apron [29]. A comparison with other tools is difcult, for details see [49, Appendix I]:


We considered four diferent confgurations, namely, Interval: the analysis from Section 4 with Intervals; Octagon: the same analysis with Octagons; TIDs: the analysis from Section 7 with enhancement [49, Appendix F] with Octagons; Clusters: TIDs using clusters of size at most 2 only. All benchmarks were run in

Table 2: Summary of evaluation results, with individual programs grouped together. For each group the number of programs and the total number of assertions are given. ✓ (✗) indicates that all (no) assertions are proven, otherwise the number of proven assertions is given. (—) indicates invalid results produced.


a virtual machine on an AMD EPYC 7742 64-Core processor<sup>5</sup> running Ubuntu 20.04. The results of our evaluation are summarized in Table 2.

Our benchmarks. To capture particular challenges for multi-threaded relational analysis, we collected a set of small benchmarks (including the examples from this paper) and added assertions. On these, we evaluated our analyzer, NR-Goblint, and Duet. Our analysis in the Clusters confguration is capable of verifying all the programs. The other tools could only prove a handful of relational assertions.

Goblint benchmarks [48]. These benchmarks do not contain assertions. To still relate the precision of our analyzer to the non-relational NR-Goblint and to Duet, we used our tool in the Clusters setting to automatically derive invariants at each locking operation. Perhaps surprisingly, NR-Goblint could verify 95% of the invariants despite being non-relational and not using thread ids.

Watts benchmarks [31]. These programs were instrumented with asserts and signifcantly changed by the authors. Our analyses can verify all but 7 out of over 1000 assertions. Due to necessary fxes to programs and our inability to run their tool, numbers are not directly comparable. Nevertheless, for their scalability tests, reported runtimes for Watts are up to two orders of magnitude worse than ours. See [49, Appendix I] for a more detailed discussion.

<sup>5</sup> The analyzer is single-threaded, so it only used one (virtual) core per analysis job.


(a) Number of discovered thread ids and proportion of program points where analysis with thread ids is more precise.

Fig. 13: Precision and performance evaluation on the Goblint benchmark set.

Ratcop benchmarks [42]. These were Java programs. After manual translation to C, our analyzer succeeded in proving all assertions any confguration of Ratcop could with Octagons, while Ratcop required polyhedra in one case.

Internal comparison We evaluated our analyses in more detail on the Goblint benchmark set [48]. Fig. 13a shows sizes of the programs (in Logical LoC) and the number of thread ids found by the analysis from Section 6. The high number of threads identifed as unique is encouraging. To evaluate precision, we compared the abstract values at each program point (joined over contexts). Fig. 13a shows for what proportion of program points tracking thread ids increases precision. There were no program points where precision decreased or values became incomparable, while for some programs gains of over 50% were observed. Fig. 13b illustrates runtimes. In 9 of 12 cases, performance diferences between our relational analyses are negligible. In all cases, using clusters incurs no additional cost. Thus, the more precise analysis with clusters of size ≤ 2 seems to be the method of choice for thread-modular relational abstract interpretation.

# 10 Related Work

Since its introduction by Miné [36, 37], the weakly relational numerical domain of Octagons has found wide-spread application for the analysis and verifcation of programs [8, 14]. Since tracking relations between all variables may be expensive, pre-analyses have been suggested to identify clusters of numerical variables whose relationships may be of interest [8, 14, 26, 45]. A dynamic approach to decompose relational domains into non-overlapping clusters based on learning is proposed by Singh et al. [55]. While these approaches trade (unnecessary) precision for efciency, others try to partition the variables into clusters without compromising precision [15, 23, 24, 44, 54, 56]. These types of clustering are orthogonal to our approach and could, perhaps, be combined with it.

The integration of relational domains into thread-modular abstract interpretation was pioneered by Miné [39]. His analysis is based on lock invariants determining for each mutex a relation which holds whenever the mutex is not held. Weak interferences are used to account for asynchronous variable accesses. For practical analyses, a relational abstraction only for lock invariants is proposed, while using a coarse, non-relational abstraction for the weak interferences. This framework closely follows the framework for non-relational analysis [38], while abandoning background locksets. Our relational analysis, on the other hand, maintains at each mutex a only relations between variables write-protected by a. For these relations more precise results can be obtained, since they are incorporated into the local state at locks by meet (while [39] uses join).

Monat and Miné [40] present an analysis framework which is orthogonal to our approach. It is tailored to the verifcation of algorithms that do not rely on explicit synchronization via mutexes such as the Bakery algorithm. Suzanne and Miné [57] extend [40] to handle weak memory efects (PSO, TSO) by incorporating memory bufers into the thread-local semantics. The notion of interferences is also used by Sharma and Sharma [52] for the analysis of programs under the Release/Acquire Memory Model of C11 by additionally tracking abstractions of modifcation sequences for global variables. They consider fxed fnite sets of threads only, and do not deal with thread creation or joining.

Earlier works on thread-modular relational analysis rely on Datalog rules to model interferences in the sense of Miné in combination with abstract interpretation applied to the Data-Flow Graph [19] or the Control-Flow Graph [31] (later extended to weak memory [32]), respectively. Botbol et al. [10] give a non-thread-modular analysis of multi-threaded programs with message-passing concurrency by encoding the program semantics as a symbolic transducer.

In all these approaches clusters of variables, if there are any, are predefned and not treated specially by the analysis. This is diferent in the thread-modular analysis proposed by Mukherjee et al. [42]. It propagates information from unlocks to locks. It is relational for the locals of each thread, and within disjoint subsets of globals, called regions. These regions must be determined beforehand and must satisfy region-race freedom. In contrast, the only extra a priori information required by our analysis, are the sets of (write-) protecting mutexes of globals – which can be computed during the analysis itself. The closest concept within our approach to a region is the set of globals jointly protected by mutexes. These sets may overlap – which the analysis explicitly exploits. Like ours, their proof of correctness refers to a thread-local semantics. Unlike ours, it is based on interleavings and thus overly detailed. The concrete semantics on which our analyses are based, is a collecting local trace semantics extending the semantics of Schwarz et al. [48] by additionally taking thread termination and joins into account. The analyses in [48], however, are non-relational. No refnement via further fnite abstractions of local traces, such as thread ids is provided.

The thread id analysis perhaps most closely related to ours, is by Feret [20] who computes ids for agents in the π-calculus as abstractions of sequences of encountered create edges. Another line of analysis of concurrent programs deals with determining which critical events may happen in parallel (MHP) [1– 4, 7, 17, 43, 59] to detect programming errors like, e.g., data races, or identifying opportunities for optimization. Mostly, MHP analyses are obtained as abstractions of a global trace semantics [18]. We apply related techniques for improving thread-modular analyses – but based on a local trace semantics. Like MHP analyses, we take thread creation and joining histories as well as sets of held mutexes into account. Additionally, we also consider crucial aspects of the modifcation history of globals and provide a general framework for further refnements.

In a sequential setting, splitting control locations according to some abstraction of reaching traces is a common technique for improving the precision of datafow analyses [9, 27] or abstract interpretation [25, 34, 41, 47]. Control point splitting can be understood as an instance of the reduced cardinal power domain [12, 13, 22]. For the analysis of multi-threaded programs, Miné [39] applies the techniques of Mauborgne and Rival [34] to single threads, i.e., independently of the actions of all other threads. Our approach, on the other hand, may take arbitrary properties of local traces into account, and thus is more general.

# 11 Conclusion and Future Work

We have presented thread-modular relational analyses of global variables tailored to decomposable domains. In some cases, more precise results can be obtained by considering smaller clusters. For k-decomposable domains, however, we proved that the optimal result can already be obtained by considering clusters of size at most k. We have provided a framework to incorporate fnite abstractions of local traces into the analysis. Here, we have applied this framework to take creation as well as joining of threads into account, but believe that it paves the way to seamlessly enhance the precision of thread-modular abstract interpretation. The evaluation of our analyses on benchmarks proposed in the literature indicates that our implementation is competitive both w.r.t. precision and efciency. In future work, we would like to experiment with further abstractions of local traces, perhaps tailored to particular programming idioms, and also explore the potential of non-numerical 2-decomposable domains.

Acknowledgements. This work was supported by Deutsche Forschungsgemeinschaft (DFG) – 378803395/2428 ConVeY and the Estonian Centre of Excellence in IT (EXCITE), funded by the European Regional Development Fund.

# References


1016/S0304-3975(98)00194-7, URL https://doi.org/10.1016/S0304-3975(98) 00194-7


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Adversarial Reachability for Program-level Security Analysis?

Soline Ducousso1() , Sébastien Bardin1() , and Marie-Laure Potet<sup>2</sup>

> <sup>1</sup> Université Paris-Saclay, CEA, List, Saclay, France soline.ducousso@cea.fr, sebastien.bardin@cea.fr <sup>2</sup> Univ. Grenoble Alpes, VERIMAG, Grenoble, France marie-laure.potet@univ-grenoble-alpes.fr

Abstract. Many program analysis tools and techniques have been developed to assess program vulnerability. Yet, they are based on the standard concept of reachability and represent an attacker able to craft smart legitimate input, while in practice attackers can be much more powerful, using for instance micro-architectural exploits or fault injection methods. We introduce adversarial reachability, a framework allowing to reason about such advanced attackers and check whether a system is vulnerable or immune to a particular attacker. As equipping the attacker with new capacities significantly increases the state space of the program under analysis, we present a new symbolic exploration algorithm, namely adversarial symbolic execution, injecting faults in a forkless manner to prevent path explosion, together with optimizations dedicated to reduce the number of injections to consider while keeping the same attacker power. Experiments on representative benchmarks from fault injection show that our method significantly reduces the number of adversarial paths to explore, allowing to scale up to 10 faults where prior work timeout for 3 faults. In addition, we analyze the well-tested WooKey bootloader, and demonstrate the ability of our analysis to find attacks and evaluate countermeasures in real-life security scenarios. We were especially able to find an attack not mentioned in a previous patch.

Keywords: Program analysis · Attacker model · Fault injection · Symbolic execution

# 1 Introduction

Context. Major works have delved into program analysis over the last decades, leveraging techniques such as symbolic execution [24,53,18], static analysis [43], abstract interpretation [30] or bounded model checking [29], to hunt for software vulnerabilities and bugs in programs, or to prove their absence [35,60], leading to industrial adoption in some leading companies [18,43,6,60,66]. As bugs are an attack entry point, removing them is a first step towards better software security.

<sup>?</sup> Partially supported by grants ANR TAVA, PEPR Secureval and Carnot Flexsecurity.

Problem. Yet, stepping back from these successes, it appears that all these methods consider a rather weak threat model, where the attacker can only craft smart "inputs of death" through legitimate input sources of the program, exploiting corner cases in the code itself. Tools only looking for bugs and software vulnerabilities may deem a program secure while the bar remains quite low for an advanced attacker, able for example to take advantage of attack vectors such as (physical) hardware fault injections [58], micro-architectural attacks [61,70], software-based hardware attacks [86,55,69] like Rowhammer [70], or any combination of vectors [63]. While previously limited to high-security devices and systems such as smart cards and cryptography modules [16,13], these fault-based attacks can now target a wider spectrum of systems, such as bootloaders [57], firmware update modules [19], security enclaves [69], etc. The reasoning behind automated software-implemented fault injection also applies to Man-At-The-End attacks [3] and is similar to the (manual) reasoning performed in control-flow integrity to evaluate countermeasures [1,21].

Goal & Challenges. Our goal is to devise a technique to automatically and efficiently reason about the impact of an advanced attacker onto program security properties, where the standard reachability framework only supports an attacker crafting smart legitimate inputs. The first challenge is to provide a formal framework to study what an advanced attacker can do to attack a program. Interestingly, while such frameworks are routinely used in cryptographic protocol verification [26,7], none has been studied for program-level analysis. The second challenge is to design an efficient algorithm to assess the vulnerability of a program to a given attacker model, while adding capabilities to the attacker naturally gives rise to a significant path explosion – especially in the case of multiple fault analysis.

The rare prior works in the field, mostly focused on encompassing physical fault injections for high-security devices, rely mostly on mutant generation [28,79,49,25,50] or forking analysis [76,15,20,63], yielding scalability issues. Moreover, most of them are limited to a few predefined fault models and do not propose any formalization of the underlying problem.

Proposal. We propose adversarial reachability, a formalism extending standard reachability to reason about a program execution in the presence of an advanced attacker, and we build a new algorithm based on symbolic techniques, named adversarial symbolic execution, to address the adversarial reachability problem from the bug finding point of view (bounded verification). Our algorithm prevents path explosion thanks to a new forkless encoding of faults. We show it is correct and k-complete with respect to adversarial reachability. To improve the performance further, we design two new optimizations to reduce the number of injected faults: Early Detection of fault Saturation and Injection On Demand.

Contributions. As a summary, we claim the following novelties:

– We formalize the adversarial reachability problem (Section 4), extending standard reachability to take into account an advanced attacker, together with the associated correctness and completeness definitions;


This work is a first step in designing efficient program analysis techniques able to take into account advanced attackers. The approach is generic enough to accommodate many common fault models, including the bit flip from RowHammer, test inversion or arbitrary data modification; still, instruction skips or modifications are currently out of reach. Also, while we investigate the bug finding side of the problem (underapproximation), the verification side (overapproximation) is interesting as well. These are exciting directions for future research.

Our dataset and benchmark infrastructure are made available through artifact<sup>2</sup> for reproducibility purpose, and the code is open-sourced<sup>3</sup> .

# 2 Motivation

We start by motivating the need for adversarial reachability, first with a description of several realistic attack scenarios on software involving advanced attackers (Section 2.1), second with a small example showing the need for dedicated analysis (Section 2.2).

#### 2.1 Fault Injection across Security Fields

We describe hereafter several real software-level security scenarios where the attacker goes beyond crafting legitimate input to abuse the system under at-

<sup>1</sup> WooKey [14,89] is a secure USB mass storage device developed by the French National Security Agency, and has recently served as a recent challenge among French security evaluators.

<sup>2</sup> DOI: 10.5281/zenodo.7507112 https://zenodo.org/record/7507112#.Y7cLsKfMJhE

<sup>3</sup> https://github.com/binsec/binsec-ase

tack. Interestingly, while these scenarios were historically focused on hardwarehardened high-security systems (such as smart cards) and associated with complex physical attack means, many recent scenarios do involve software-only attacks on standard systems, with targets encompassing cryptographic libraries, bootloaders, firmware updaters, security enclaves, etc.

Hardware Fault Injection Attacks [58] cause erroneous computations by disturbing signal propagation in the chip with physical means such as electromagnetic pulses [39], laser beams [85,4], or power [19] and clock glitches. The associated fault models include bit-, byte- or word- set and reset, bit-flips, instructions corruption and instruction skips. State-of-the-art attacks involve multiple fault injections [59], as expected by the high level of attack potential in Common Criteria vulnerability analysis.

Software-implemented Hardware Attacks push the hardware into unstable states using software controlled mechanisms, like delays in memory buses inducing bit-flips in data fetched from memory [55] or CPU voltage and frequency manipulations yielding bit-flips in the processor [86,69]. The notorious Rowhammer attack [70] abuses memory accesses to induce bit-flips in flash memory.

Micro-architectural Attacks use micro-architectural behaviors in unexpected ways. For example: Spectre (version v1) [62] exploits branch predictors in speculative executions, which can be seen as a test inversion followed by a rollback; Load Value Injection [87] injects arbitrary data into transient execution; race attacks [54] corrupt data of other running processes and can be seen as arbitrary data faults.

Man-At-The-End Attacks considers an attacker having full observability and control over a software code and its execution [3], with the goal to steal sensitive data or code (reverse engineering attacks). The associated attacker model is hence very powerful, with capabilities such as halting and modifying data and code at any point of the execution.

CFI Reasoning In order to assess the power of Control-Flow Integrity (CFI) mechanisms, researchers [1,21] define hypothetical attackers by their capabilities, such as "write anything anywhere" or "write anything somewhere", and manually prove that their countermeasure is indeed able to thwart such an opponent. While not per se an applicative security scenario, the techniques developed in this paper could help automate such essential reasoning.

# 2.2 Motivating Example

The motivating example in Figure 1 is a simple unrolled program inspired by the VerifyPIN benchmark [42], from the domain of hardware fault injection and smart cards. The user PIN digits u1 to u4 are checked against the reference digits ref1 to ref4, using the accumulator res. The attacker seeks to be authenticated (validate the assert l.16) without knowing the right digits (l.14).

```
1 b o ol g_ au then tic a ted ;
 2 int u1 , u2 , u3 , u4 , r e f 1 , r e f 2 , r e f 3 , r e f 4 ;
 3
 4 void v e ri f yP IN ( ) {
 5 int r e s = 1 ;
 6 r e s = r e s ∗ ( u1 == r e f 1 ) ;
 7 r e s = r e s ∗ ( u2 == r e f 2 ) ;
 8 r e s = r e s ∗ ( u3 == r e f 3 ) ;
 9 r e s = r e s ∗ ( u4 == r e f 4 ) ;
10 g_ au then tic a ted = r e s ;
11 }
12
13 void main ( int argc , char const ∗ argv [ ] ) {
14 a s s e r t ( u1!= r e f 1 | | u2!= r e f 2 | | u3!= r e f 3 | | u4!= r e f 4 ) ;
15 v e ri f yP IN ( ) ;
16 a s s e r t ( g_ au then tic a ted == t r u e ) ; /∗ S e c u r i t y o r a c l e ∗/
17 }
```
Fig. 1: Motivating example, inspired by VerifyPIN [42]

Here, the attacker indeed cannot succeed by only crafting legitimate inputs. However, an advanced attacker can leverage more powerful attack vectors to inject faults into the program in order to succeed. For instance, corrupting g\_authenticated to true at l.10 achieves the attacker goal. It could be obtained for example through a physical- or Rowhammer- attack.

Program Analysis As expected, standard symbolic execution tools such as Klee [22], angr [84] or BINSEC [38] do not report any violation here, as they consider the simplest possible attacker. We can try to use SWiFI techniques [76,15,20,63] (detailed in Section 3.1) from high-security system evaluation. Yet, the standard forking approach does not scale with multiple faults: here, 166 paths are explored in 0.6 seconds for 1 fault, 2994 paths in 11 seconds for 2 faults, and it keeps on adding a factor x10 in explored paths and analysis time for each extra fault, until the analysis timeouts (12 hours) above 4 faults. On the contrary, our forkless algorithm presented in Section 5 simulates fault injection without creating new paths and, in this example, shows a constant runtime as the number of faults increases from 1 to 10 – we explore 9 paths in 0.2 seconds in all cases.

# 3 Background

We provide in this section background information on software-implemented fault injection, standard reachability and symbolic execution.

63

#### 3.1 Software-implemented Fault Injection (SWiFI)

SWiFI tools [28,76,79,15,49,25,20,50,63,68] have been developed in the community of high-secure systems to ease hardware fault injection campaigns, which are time consuming and require special equipment. SWiFI evaluates a program with the transformations induced by the effects of hardware faults, in order to find interesting attack paths. We distinguish two main SWiFI techniques.

First, the Mutant generation approach [28,79,49,25,50] consists in analyzing slightly modified versions of the program (named mutants), each of them embedding a different faulty instruction. Each mutant is then analyzed on its own. The main limitation of mutant generation is the explosion of mutants, in particular for multiple faults. Also, as the different mutants differ only slightly, analyzing each of them separately wastes lots of time repeating similar reasoning.

$$\begin{array}{c|c} \\ \hline \mathbf{x} := \mathbf{y} + \mathbf{z} \\ \hline \end{array} \qquad \begin{array}{c|c} \mathbf{if} & (fault\\_here) \\ & \text{then} \ \mathbf{x} := \,\,fault\\_value \\ & \mathbf{else} \ \mathbf{x} := \,\,\mathbf{y} + \mathbf{z} \\ \hline \end{array}$$

(a) Original statement

```
(b) Forking transformation
```
Fig. 2: Forking code transformation in pseudo-code

Second, the forking approach [76,15,20,63] consists in instrumenting the analysis (or the code, via instrumentation) to add all possible faults as forking points (branches) controlled by boolean values indicating whether a particular fault will be taken or not, plus constraints on the maximal number of faults allowed. A forking data fault is illustrated in Figure 2. A standard program analysis technique is then launched – typically symbolic execution or bounded model checking. Compared with mutant generation, this method allows sharing the analysis between the different possible faults. Still, the number of paths explodes with the number of possible faults (forking points).

Scalability Issues These two approaches yield an explosion of the whole search space w.r.t. the number of fault injection points in the program: the mutant approach leads to consider up to C n k (k among n) <sup>4</sup> mutants for a program under analysis with n possible fault locations and k faults, while the forking approach yields up to C n k paths to analyzed for a single original program path with n possible fault locations and k faults.

In the following, we will consider the forking approach as the baseline – please keep in mind that the mutant approach scales worse.

Fault Models Supported fault models vary for each tool, but they are usually adapted from hardware fault models [47,82]. The most common fault models are (1) data faults such as arbitrary data modifications, set and reset of bytes, words or variables, bit-flips; and (2) instruction corruptions such as instruction skips

<sup>4</sup> Remind that C n <sup>k</sup> = ( <sup>k</sup> n ) = <sup>n</sup>! k!(n−k)!

and test inversions. Most tools are limited to one (sometimes two) hard-coded fault models. Only few SWiFI tools can handle multiple faults [88,76,63,68] – still with scalability issues.

#### 3.2 Standard Reachability Formalization

Considering a program P, we denote S the set of all possible states of P. A state is composed of the code memory, the data memory (i.e. the stack and heap), the state of registers and the location of the next instruction to execute. The set of input states of a program P is noted S<sup>0</sup> ⊂ S. The set of transitions (or instructions) of the program is denoted T. The execution of an instruction t is represented by a one-step transition relation →t∈ S ×S. We denote s → s <sup>0</sup> when s →<sup>t</sup> s 0 for some t ∈ T. We extend the transition relation over any finite path π ∈ T ∗ through composition. The transitive reflexive closure of → is noted →<sup>∗</sup> . Finally, we use S → s <sup>0</sup> as a shortcut for ∃s ∈ S.s → s 0 , and →<sup>≤</sup><sup>k</sup> for reachability in at most k steps.

We consider in the rest of the paper the case of location reachability: given a location l (instruction or code address) of the program under analysis, the question is whether we can reach any state s at location l. More formally, L is the finite set of locations of P, and we consider a mapping loc : S 7→ L from states to locations. For example, loc may return the program counter value. We write S →<sup>∗</sup> l as a shortcut for ∃s <sup>0</sup> ∈ S.S →<sup>∗</sup> s <sup>0</sup> ∧ loc(s 0 ) = l.

Definition 1 (Standard reachability). A location l is reachable in a program P if S<sup>0</sup> →<sup>∗</sup> l.

We now define correctness and completeness for a program analyzer.

Definition 2 (Correctness, completeness). Let V : (P, l) 7→ {1, 0} be a verifier taking as input a program P and a target location l.


We want to stress out that while location reachability can be seen as a basic case, we consider it sufficient here for two reasons: first, it keeps the formalism light while still straightforward to generalize to stronger reachability properties (e.g., local predicates of the form (l, ϕ), sets of finite traces, etc.); second, it is already rather powerful on its own, as we can still instrument the code to reduce some stronger forms of reachability to it (e.g., adding local assertions or monitors).

#### 3.3 Symbolic Execution

Symbolic execution (SE) [52,83,23,24] is a symbolic exploration technique for standard reachability. Algorithm 1 gives a high-level view of a typical SE al-


Input: a program P, a bound k, a target location l Output: Boolean value indicating whether l can be reached within k steps.

```
1 for path π in GetPaths(k) do
2 if π reaches l then
3 Φ := GetPredicate(π)
4 if Φ is satisfiable then
5 return true
6 end
7 end
8 end
9 return false
```
gorithm, adapted for location reachability<sup>5</sup> . The analysis follows each possible path π of a program up to a depth bound k. If π reaches the target, then we check whether π is indeed feasible by computing its path predicate Φ – a logical formula representing the path constraints over the input variables along π, and sending it to a SMT solver [12], that will try to answer whether the formula is satisfiable or not, and provide a model for free variables (e.g. inputs) if it is (omitted here for simplicity). SE is correct for location reachability, and even k-complete if we assume a perfect encoding of path predicates.


In this paper, we will focus on the evaluation of assignments and conditional jumps for SE, detailed in Algorithms 2 and 3 respectively, as this is where our adversarial symbolic execution will mainly differ from the standard one. It requires going slightly deeper into details. In practice, the program paths are explored incrementally. A worklist W L records all pending paths together with their associated path predicate and their next instruction to explore. On conditional branches, the symbolic path is split in two (one for each branch, updating the path constraint accordingly), and each new prefix is added to the worklist (Al-

<sup>5</sup> More complex properties can be verified with the same principles, such as local predicate reachability, trace properties or hyper-properties [36].

#### Algorithm 3: Conditional jump evaluation in SE

Input: path predicate Φ, conditional jump instruction if cdt then l<sup>t</sup> else l<sup>e</sup> Data: a worklist W L containing the pending path prefixes to explore – list of pairs (path predicate, next location) Output: W L updated in place 1 Function eval\_conditional\_jump(Φ, cdt, lt, le) is 2 if Φ ∧ cdt is satisfiable then

3 Add (Φ ∧ cdt, lt) to W L 4 end 5 if Φ ∧ (¬cdt) is satisfiable then 6 Add (Φ ∧ ¬cdt, le) to W L 7 end 8 end

gorithm 3). Assignments are dealt with straightforwardly, simply adding a new logical variable definition to the path predicate <sup>6</sup> (notation: x , y).

# 4 Adversarial Reachability

In this section, we detail the advanced attacker model we consider and define the adversarial reachability problem. Especially, advanced attackers can do more than carefully crafting legitimate inputs to trigger vulnerabilities in a software. They can use a wide variety of attack vectors (e.g. hardware fault injection attacks, software-implemented hardware attacks, micro-architectural attacks, software attacks, etc), in any combination, and multiple times. We suppose attack vectors prerequisites have been met, and only consider the impact of the faults on the program under attack.

Our attacker model has three components: (1) a set of attacker actions, equivalent to fault models; (2) the maximum number of actions the attacker can perform; and (3) a goal, expressed here as a location reachability query.

Formally, given a program P with set of states S, set of transitions T and set of locations L, we extend the transition model described in Section 3.2 to include an adversarial transition ❀A∈ S ×S related to an attacker A, i.e. T<sup>A</sup> = T∪ ❀A. To specify practical fault models, restrictions are applied onto ❀A, limiting what part of the state can be modified and how. For instance, when considering arbitrary data faults, only the data memory and the register values can be modified. Then, the transition relation of P under attacker A is denoted as 7→A=→ ∪ ❀A= (∪t∈<sup>T</sup> t)∪ ❀A. We extend the notations from Section 3.2 to the relation 7→A. Especially, S 7→<sup>∗</sup> <sup>A</sup> s <sup>0</sup> means ∃s ∈ S.s 7→<sup>∗</sup> <sup>A</sup> s 0 , the adversarial transition relation up to k is denoted 7→A,≤k.

<sup>6</sup> Actually, a symbolic state usually comprises the path predicate itself plus a mapping from program variable names to logical variable names, and assignments involve both creating new logical names and updating the mapping. We abstract away from these details.

Still, we need to take into account the maximum number of faults the attacker can perform along an execution. Given a path π over T ∗ <sup>A</sup>, π is said to be legit if it does not contain ❀A, and faulty otherwise. The number of occurrences of transition ❀<sup>A</sup> in π is its number of faults. Given a bound m<sup>A</sup> on the fault capability of A, we define 7→<sup>∗</sup> (A,mA) by limiting the adversarial reachability relation to paths π with less than m<sup>A</sup> faults. We consider m<sup>A</sup> to be +∞ in case the attacker has no such limitation. For the sake of simplicity, in the following, we will consider m<sup>A</sup> as an implicit parameter of A, and simply write 7→<sup>∗</sup> <sup>A</sup> instead of 7→<sup>∗</sup> (A,mA) .

Definition 3 (Adversarial reachability). Given an attacker A with a m<sup>A</sup> faults budget and a program P, a location l ∈ L is adversarially reachable if S<sup>0</sup> 7→<sup>∗</sup> <sup>A</sup> s <sup>0</sup> ∧ loc(s 0 ) = l for some s <sup>0</sup> ∈ S.

In the following, adversarial reachability of location l from a set of states S<sup>0</sup> will be denoted S<sup>0</sup> 7→<sup>∗</sup> <sup>A</sup> l.

Proposition 1. Standard reachability implies adversarial reachability. The converse does not hold.

Proof. Standard reachability can be viewed as adversarial reachability with an attacker able to perform 0 faults.

We redefine what it means for an analysis answering adversarial reachability to be correct, complete and k-complete.

Definition 4. Let V<sup>A</sup> : (P, A, l) 7→ {1, 0} be a verifier taking as input a program P, an attacker A with m<sup>A</sup> fault budget and a target location l.


# 5 Forkless Adversarial Symbolic Execution (FASE)

In this section, we present our forkless algorithm for adversarial reachability. The analysis aims to find inputs and a fault sequence compatible with the considered attacker model and reaching the target location. Our primary goal is to deal with the potential path explosion induced by possible faults. Our design guiding principles are the following:


#### 5.1 Modelling Faults via Forkless Encoding

The forkless encoding aims to address the path explosion induced by the forking treatment of fault injection in prior works. It is designed mainly for data faults and consists of wrapping arithmetically an assignment right-hand side, as shown in Figure 3 for an arbitrary data fault. The activation of this fault location is determined by the symbolic Boolean value f ault\_here, and the corrupted value of x is the fresh variable f ault\_value.

The point is to embed the fault injection as an expression inside the logical formula, without any explicit path forking at the analysis top-level, in order to let the analyzer reason about both legit executions and faulty executions at the same time – this is akin to path merging in some ways, except that we do it only for the treatment of fault injection (we could also see the approach as avoiding undue path splits).

Multiple forkless arbitrary data encodings are possible. We chose to use the ite expression operator, an inlined form of if-then-else at the expression level. We also tried encodings inspired from branchless programming idioms (e.g.: (b)·x+(1−b)·y. for ite(b, x, y) with b a Boolean value) – in our experiments they worked as well as the ite operator. Other data fault models are supported, such as set, reset, bit-flips, etc. Test inversion is also supported by applying faults to the condition of conditional jumps. Table 1 illustrates various forkless encodings. Note that the forkless encoding is not designed for instruction corruptions or instruction skips, as these modifications either yield permanent code modification or span several instructions.

x := expr x:= i t e f a u l t \_ h e r e ? f a u l t \_ v a l u e : expr

(a) Original statement (b) Forkless transformation for arbitrary data fault

Fig. 3: Forkless injection technique


Table 1: Forkless encodings for various fault models Fault model original instruction Forkless encoding

Trade-off. While these sorts of encoding indeed allow a significant path reduction compared to forking approaches, the corresponding path predicates are more complicated than standard path predicates, as they involve lots of extrasymbolic variables for deciding whether the faults occur and for emulating their effect. We show later in this section how to reduce these extra-variables.

# 5.2 Building Adversarial Path Predicates

Adversarial symbolic execution requires modifications to Algorithms 2 and 3, as illustrated in Algorithms 4 and 5 respectively.



Input: path predicate Φ, conditional jump instruction if cdt l<sup>t</sup> else l<sup>e</sup> Data: fault counter nb<sup>f</sup> , maximal number of faults max<sup>f</sup> , worklist W L Output: W L updated in place 1 Function eval\_conditional\_jump(Φ, cdt, lt, le) is 2 if Φ ∧ cdt ∧ (nb<sup>f</sup> ≤ max<sup>f</sup> ) is satisfiable then 3 Add (Φ ∧ cdt, lt) to W L 4 end /\* Idem for else branch (¬cdt) \*/ 5 end

The assign evaluation process embeds a wrapper encoding the fault in a forkless manner. Note that F aultEncoding involves the declaration of fresh symbolic variables for fault decisions and fault effects – hence the update of the path predicate Φ. Also, the fault counter nb<sup>f</sup> is updated, and a new potentially faulted expression expr<sup>0</sup> is computed.

Note that checking if the fault counter nb<sup>f</sup> does not exceed the maximal number of faults max<sup>f</sup> can be performed at different places. We found the best trade-off is to augment the conditional jump queries to check if we could explore each branch without exceeding max<sup>f</sup> (see Algorithm 5), as checking at the end of a path often involves exploring many unfeasible faulty paths.

We refer to this set of modifications as Forkless Adversarial Symbolic Execution (FASE).

#### 5.3 Algorithm Properties

We now consider the properties of the FASE algorithm.

Proposition 2. The FASE algorithm is correct and k-complete for adversarial reachability.

Sketch of proof. If our algorithm finds an adversarial path reaching the target location l, by providing specific input values and a fault sequence, then an attacker executing the program with the provided inputs and performing the proposed faults will reach its goal. Our algorithm is based on symbolic execution with bounded path depth and explores all possible attack paths according to the considered attacker model, hence its k-completeness for adversarial reachability.

Tightness of FASE. Consider a single path with no branching instruction and an assert statement to be checked at the end, together with f possible fault locations and a maximum of m faults. Then the forking SE yields up to C f <sup>m</sup> paths to analyze, and as many queries to send to the solver. In the same scenario, FASE will analyze only the original path, and send a single query to the solver.

Still, the Forkless encoding increases query complexity, as shown in Section 7. We present in the remainder of this section two mitigation techniques.

#### 5.4 Optimization via Early Detection of Fault Saturation (FASE-EDS)


```
Input: path predicate Φ, conditional jump instruction if cdt then lt else le
Data: fault counter nbf , maximal number of faults maxf , worklist W L
Output: W L updated in place
```

```
1 Function eval_conditional_jump_EDS(Φ, cdt, lt, le) is
```

```
2 if Φ ∧ cdt ∧ (nbf < maxf ) is satisfiable then
```
3 Add (Φ ∧ cdt, lt) to W L

```
4 else if Φ ∧ cdt ∧ (nbf == maxf ) is satisfiable then
```

```
5 Stop injection in this path
6 Add (Φ ∧ cdt, lt) to W L
```

```
7 end
```

```
/* Idem for else branch (¬cdt) */
```

```
8 end
```
The first angle we explore to minimize query complexity is to reduce the number of injection points by stopping the injection process as soon as possible. Indeed, fewer injection points mean fewer extra symbolic variables and in general smaller and simpler queries for the SMT solver. We call this optimization Early Detection of fault Saturation, and write FASE-EDS when it is activated.

Its difference compared to FASE is in handling conditional jumps, illustrated in Algorithm 6. Instead of checking whether a branch can be explored without exceeding the maximum number of faults, we double the check: (1) first we check whether the branch can be explored with strictly fewer faults than allowed. If the query is satisfiable, the analysis continues down that branch as usual; (2) if not satisfiable, we check whether the branch is feasible with exactly the maximal number of faults allowed. If not, the branch is infeasible and we stop as usual. Yet, if it is feasible, then we know that we have spent all allowed faults. We can thus continue the exploration without injecting any new fault in the corresponding search sub-tree, leading to simpler subsequent queries.

Proposition 3. FASE-EDS is correct and k-complete for the adversarial reachability problem.

Proof. FASE-EDS remains correct as it does not modify the path predicate computation, and it remains k-complete as it only prunes fault injections that are actually infeasible – and would have been proven so by the solver, later in the solving process.

# 5.5 Optimization via Injection on Demand (FASE-IOD)

The second angle explored to reduce query complexity through the reduction of injection points is to inject faults on demand, only when they are truly needed. We call this optimization Injection On Demand, and write FASE-IOD when it is activated.

To inject faults on demand, we now build two path predicates along a path: the working path predicate Φ based on which solver queries are built (where we try to minimize fault injection), and the normal adversarial path predicate Φ<sup>F</sup> computed in previous sections (encompassing all the faults seen so far).


```
1 Function eval_assign_IOD(Φ, ΦF , cdt, x, expr) is
```

```
2 Φ
       0
       F , expr0
               , nbf := FaultEncoding(ΦF , expr, nbf )
```

#### Algorithm 8: FASE-IOD conditional jump evaluation

Input: path predicate Φ, conditional jump instruction if cdt then l<sup>t</sup> else l<sup>e</sup> Data: fault counter nb<sup>f</sup> , maximal number of faults max<sup>f</sup> , under approximation counter under\_counter, worklist W L Output: W L updated in place 1 Function eval\_conditional\_jump\_IOD(Φ, Φ<sup>F</sup> , cdt, lt, le) is 2 if Φ ∧ cdt ∧ (nb<sup>f</sup> ≤ max<sup>f</sup> ) is satisfiable then

3 Add (Φ ∧ cdt, Φ<sup>F</sup> ∧ cdt, lt) to W L 4 else if under\_counter ≤ max<sup>f</sup> then 5 if Φ<sup>F</sup> ∧ cdt ∧ (nb<sup>f</sup> ≤ max<sup>f</sup> ) is satisfiable then 6 Φ := Φ<sup>F</sup> 7 under\_counter := under\_counter + 1 8 Add (Φ ∧ cdt, Φ<sup>F</sup> ∧ cdt, lt) to W L 9 end 10 end /\* Idem for else branch (¬cdt) \*/ 11 end

Algorithms are updated accordingly. Especially, assignment evaluation is duplicated as shown in Algorithm 7: The normal symbolic assignment, with the original right-hand-side expression expr, is added to Φ, while Φ<sup>F</sup> is updated with the fault encoding of the assignment, expr<sup>0</sup> .

The on-demand reasoning takes place in the conditional jump instruction process detailed in Algorithm 8. The basic idea is to first check branch feasibility with the simpler path predicate Φ, encompassing the least number of faults. We continue this way as long as we can, meaning we rely on standard reachability as much as we can.

When encountering a branch infeasible with Φ, we then check whether this branch is feasible with all the possible faults seen so far, i.e. using Φ<sup>F</sup> . If no that is a stop, otherwise we know that Φ does not encompass enough faults to go further. We then replace Φ by Φ<sup>F</sup> (called a switch) at this stage, and thus continue with strictly more faults. Note that this is straightforward as Φ<sup>F</sup> and Φ only differ on fault injections. Then again, the new Φ will not accumulate any fault (until a new switch) while Φ<sup>F</sup> continues accumulating all possible faults.

As a bonus, the number of path predicate switches gives us an underapproximation under\_counter of the number of faults already needed in the path under analysis. We use it to stop the injection early, when at least max<sup>f</sup> faults have been used.

Proposition 4. FASE-IOD is correct and k-complete for the adversarial reachability problem.

Proof. FASE-IOD explores the same feasible paths as FASE, hence preserving its properties.

# 5.6 Optimizations Combination

Algorithm 9: FASE-IOD and FASE-EDS combination, conditional jump evaluation


Both optimizations can be combined together as illustrated in Algorithm 9. Taking FASE-IOD as a basis, saturation detection is added in the faulted path predicate Φ<sup>F</sup> queries at conditional branch handling. If the saturation is detected, the main path predicate switch to Φ<sup>F</sup> but Φ<sup>F</sup> stops being updated and queried further down that path, which stops fault injection.

Proposition 5. The combination of FASE-EDS and FASE-IOD is correct and k-complete for the adversarial reachability problem.

Proof. This combination also explores all possible paths for the considered attacker models, like FASE, hence preserving its properties.

# 6 Implementation

We now provide details about our forkless adversarial symbolic execution (FASE) implementation, named BINSEC/ASE, for Adversarial Symbolic Execution. The code is made open-source<sup>7</sup> .

<sup>7</sup> https://github.com/binsec/binsec-ase

Binary-level Fault Injection. While our method works for any program abstraction level, we choose to implement it for the binary level, which makes more sense in many security scenarios. We implement our forkless adversarial symbolic execution on top of the BINSEC symbolic engine [38,40,10]. It has already been used in a number of significant case studies [9,81,80,36,37], and it is notably able to achieve bounded verification (k-completeness) and to reasonably deal with symbolic pointers [44].

We modified the path predicate computation of BINSEC 0.4.0 as described in Section 5, and implemented our dedicated optimizations FASE-EDS, FASE-IOD and FASE EDS+IOD. BINSEC consists of 60kloc of Ocaml and our modifications add 6kloc. The attacker goal is specified as a local predicate to reach, using BINSEC directives. We currently support data faults such as arbitrary modification, bit-flip and reset. Test inversion is emulated through faulting the condition of conditional jumps. We let the user define an injection target range, made of multiple code address intervals. For large programs, it enables focusing on the security critical sections. Finally, we also provide a blacklist for some memory locations which will never be faulted. The blacklist is mostly used for the stack register (esp in x86, which is concretized in the analysis) and the program counter, as our fault model does not include tampering with the stack nor arbitrary control faults.

Details. Our exploration strategy is depth first, the underlying SMT solver is Bitwuzla [71]. We constrain the faulted values to differ from the original values in fault encodings, such that only true corruptions are reported as active faults.

# 7 Evaluation

We now evaluate our new algorithm for software verification against multi-fault attacks. We consider the following research questions.


Besides this evaluation, we also show the use of our method in a number of different security scenarios (Section 7.5), and on a larger case study (Section 8).

#### 7.1 Experimental Setting

The Machine Used. We ran our experiments on a cloud machine with a processor Intel Dual Xeon 4214R with 48 CPU cores and 384GB of RAM. Experiments ran in parallel on the 48 cores, each run using only one core.

The Attacker Model chosen in this evaluation can perform a varying number of faults. Its goal is expressed as a security oracle directly written in C for each benchmark, the computation of which is not faulted.

The Benchmark used here is a standard set of programs from the SWiFI literature on physical fault injections and high-security devices, characterized in Table 2. First, the 8 versions of VerifyPIN from the FISSC [42] benchmark suite, dedicated to the evaluation of physical fault attack analyses. VerifyPIN is an authentication program. There are one unprotected and 7 different protected versions, some vulnerable, some resistant to one test inversion fault. We added 2 manually unrolled versions of the unprotected VerifyPIN, with a PIN size of 4 and 16, to add diversity in the benchmarks with programs without loops. An oracle is provided by FISSC, checking if the user PIN truly corresponds to the reference PIN. Second, we take the 2 versions of the npo2 program from Le et al. [65], together with their oracles. Npo2 is a program computing an integer's upper power of two. The attacker's goal is to perform a silent data corruption, i.e. change the end result without triggering countermeasures. One version is vulnerable to one arbitrary data fault, the second is resistant due to extra arithmetic checks.

Compilation. The benchmarks are written in C and have been compiled with gcc for the Intel x86-32 architecture, using the flag "-O0" to preserve countermeasures. For BINSEC compatibility, we use the "-static" flag to include the necessary library functions directly in the binary.


Table 2: Benchmarks characteristics and statistics of a standard SE analysis BINSEC analysis - no fault

BINSEC Settings. We limit the maximal depth of an analysis to the depth necessary to perform an exhaustive non-faulty analysis, rounded to the upper hundred. We exhaustively explore all the possible paths up to this bound and do not stop at the first identified attack, in order to have comparable results. We set the global analysis timeout for 1 day. We fault values and not addresses, we do not directly fault the stack pointer nor the program counter, and we do not fault the status flags unless explicitly specified.

#### 7.2 Correctness and Completeness in Practice (RQ1)

We first show that our tool works as expected on several codes with known ground truth. (1) We check that indeed, with no fault allowed, no attack is found in any of the benchmarks; (2) We check that indeed the insecure npo2 program is vulnerable to a single arbitrary data fault while the secure version is not – it can still be exploited with two faults; (3) According to their authors, the VerifyPIN versions 0 to 4 are vulnerable to one test inversion, while VerifyPIN 5 to 7 are resistant to it. We indeed reproduce these results. When allowing two faults, all VerifyPIN become vulnerable; (4) When using one arbitrary data fault against the VerifyPINs, all versions are found vulnerable. We manually check that indeed the identified attack paths make sense; (5) Our manually unrolled versions of VerifyPINs do not contain conditional branching instructions in the targeted function, making them resistant to test inversion. We check that this is the case, while they are still vulnerable to a single arbitrary data fault.

Conclusion. Our tool indeed can showcase a program vulnerability to fault injection attacks and prove resistance to fault injection attacks, as expected by the correctness and k-completeness properties of the underlying algorithms.

#### 7.3 Scalability (RQ2)

For this evaluation, we focus on an attacker capable of arbitrary data faults, as those weigh the heaviest on the analysis.

We take FASE-IOD as our best performing technique (see Section 7.4). We evaluate here its capability to handle multi-fault and avoid path explosion, compared to the forking technique. Results are illustrated in Figures 4 and 5. Note that all FASE variants explore the same number of paths, and are thus represented as FASE in Figure 5. For each benchmark, we took the arithmetic mean for 100 runs. Values presented here are the geometric mean over the benchmarks.

FASE-IOD is 10x times faster than Forking for 1 fault, and x200 times faster for 2 faults on average. For the best case benchmark, we are x224 times faster for 1 fault and x6121 for 2. Starting from three faults onward, Forking experiences timeouts, rendering values non comparable. Half of the benchmark timeouts for 3 faults, three quarters for 4 faults, 11 over 12 for 6 faults and all of them after that. FASE-IOD never timeouts in this experiment. This scaling is enabled by avoiding path explosion. On average, Forking explores x50 times more paths for 2 faults than for one, while FASE-IOD only explores x3 times more paths. From Figure 4, we see FASE on its own already scales better than Forking, being x3 times faster for 1 fault and x108 times faster for 2, and never experiencing timeouts either.

Conclusion. FASE-IOD shows improved scalability in terms of the maximum number of faults allowed, for the arbitrary data fault model, compared to the forking technique.

Fig. 4: Analysis time

Fig. 5: Average number of explored paths, Average solving time per query

Fig. 6: Number of queries sent to the solver

#### 7.4 Performance Optimization (RQ3)

We evaluate our forkless variants: FASE, FASE-EDS, FASE-IOD and FASE EDS+IOD, to determine which performs best for arbitrary data faults. Results are illustrated in Figures 4, 5 and 6.

We vary again the maximum number of faults from 1 to 10. Note that all FASE variants explore the same number of paths for each number of faults, as the optimizations reduce the number of faults injected but do not lose correctness nor k-completeness. FASE indeed generates complex queries<sup>8</sup> , taking on average around twice the time necessary for Forking queries to be solved. FASE-EDS then gains a little bit in that regard. FASE queries take only x1.04 longer to solve on average for all fault numbers. The real improvement comes with the On-Demand logic of FASE-IOD (x2.02 times faster on average over all fault numbers) and FASE EDS+IOD (x2.02 also), where query complexity drops to the level of Forking. This improvement in query complexity is achieved algorithmically at the price of query creation. However, due to more queries being arithmetically simplified, fewer queries are sent in the end to the solver for FASE-IOD (x0.88 on average over all fault values compared with FASE) and FASE EDS+IOD (x0.98). FASE-EDS sent approximately the same number of queries as FASE. The number of queries sent to the solver explodes for Forking, correlated with the path explosion experienced. In terms of performance, two trends appear as the number of faults allowed increases. FASE and FASE-EDS tend to be between x2 and x3 times slower than FASE-IOD and FASE EDS+IOD. In the end, FASE-IOD proves to be the fastest optimization (x1.1 times faster than FASE EDS+IOD on average over all number of faults), likely due to the combination of on-demand logic and fewer queries than FASE EDS+IOD.

Conclusion. We retain FASE-IOD as our best performing forkless adversarial algorithm, at most x3.06 faster than FASE.

#### 7.5 Other Experiments and Fault Models

CRT-RSA. Puys et al. [78] describe three versions of CRT-RSA: unprotected, Shamir version and Aumuller version. Only the last one is shown to resist the BellCoRe attack [16] which uses a single reset fault to break the cryptography. We were able to automatically reproduce the attack with 1 reset fault on the unprotected version of CRT-RSA, after 3s of analysis, and we were not able to find attacks on the other two versions in 10 days time.

Secret-keeping Machine. Dullien [41] proposes two versions of a secret-keeping machine. The one based on linked lists is manually shown to be exploitable by an attacker able to perform a single bit-flip in the memory (not in registers), while the array version is shown to be secure against that. For this benchmark,

<sup>8</sup> When counting the number of ite operators introduced in queries, from having barely any in a run without faults, we reach around 2,800 ite per query on average for FASE and 1,500 for FASE-IOD for one fault.

we activated faults on variables used as addresses. We were able to reproduce the attack on the linked list implementation with one bit-flip fault and to show the array implementation is secure for this fault model. In addition, if we allow faults in registers too, the array implementation becomes vulnerable.

SecSwift Countermeasure. We applied the SecSwift countermeasure, a llvmlevel protection developed by STMicroelectronics [45,27], to VerifyPIN version 0. We were able to find attacks yielding an early loop exit on this binary with either a single test inversion or a single arbitrary data fault. These paths belonging to the CFG of the program, these attacks are not unexpected, yet it is still interesting that our method finds them automatically.

# 8 Case Study: the WooKey Bootloader

We now confront our tool to a real-life security system, WooKey.

Presentation of WooKey. First presented in 2018 by ANSSI, the French system security agency, the WooKey platform [14,89] is "a custom STM32-based USB thumb drive with mass storage capabilities designed for user data encryption and protection, with a full-fledged set of in-depth security defenses". Their choice to be open source and open hardware makes WooKey a relevant case study: it is a real-life, complex device, security focused and available for reproducibility. Note also that Wookey has been extensively analyzed, as it was the target of an ANSSI cybersecurity challenge for security professionals [5].

Security Scenario and Goal of our Study. We focus on WooKey bootloader, a dual-bank system enabling hot firmware updates. The system is hardened, especially redundant test protections are present in critical sections to protect against test inversion faults. We consider the same attacker model as the ANSSI challenge did [5]: the attacker seeks to manipulate the bootloader logic to boot on the older firmware, more likely to contain security vulnerabilities. We also consider an attacker able to perform a single arbitrary data fault. We see in Table 2 that WooKey bootloader size is orders of magnitude larger than the programs used for evaluation in Section 7. Wookey is available as C code. We compile it like we did for the evaluation benchmarks (Section 7.1).

We conduct the following three analyses:


We discuss these results in the following and we present briefly in Section 8 the discovery of two more known faults. Overall, it demonstrates that our technique can scale to binary-level real-size systems.

Analyze Key Parts of Wookey. Lacombe et al. find an attack in the loader\_ exec\_req\_selectbank function (A1) and another in the loader\_exec\_req\_ flashlock function (A2). They correspond to data corruption in branching conditions. We are able to find both attacks, linking faults back to their locations in the C code with debug information. We also find an additional attack, faulting another part of the loader\_exec\_req\_flashlock function (A3).

Analyze a Security Patch of WooKey. We now evaluate the protection scheme proposed by Lacombe et al. [63] for these attacks. It consists of four extra counter-measures named from CM1 to CM4. We found indeed that the full protection prevents attacks A1 and A2, as claimed by the authors of the patch. Yet, our analysis shows that the protection does not prevent the new attack A3.

Propose a New Patch and Evaluate It. We manually inspect these different analysis results to understand what happens. We have especially been able to identify the root cause of A3 and propose a dedicated countermeasure for it (named CMA). Also, by analyzing each counter-measure in isolation, we have been able to understand that counter-measures CM1 and CM3 do not block any attack path as they are redundant with other tests in the code and can be safely removed. Overall, our new patch (CMA + refined former patch) is shown by our tool to protect against all the attacks, for an attacker able to perform one arbitrary data fault (Table 3).


Table 3: Table summarizing the effects of countermeasures

Legend - ✓: attack path found by our tool / ✗: no attack found

Other Attacks on WooKey. We were also able to find two other known attacks on Wookey. (Attack vector combination) The iso8716 library, used in WooKey for secure communication, presents a vulnerability to fault injection which enables a software buffer-overflow in function SC\_get\_AT R [63]. Using an attacker with a single arbitrary data fault, we were able to reproduce this attack; (Faulty redundant test) Martin et al. [68] shows an incorrect implementation of a redundant test to prevent single test inversion faults in the loader\_set\_state function. We reproduce this result.

# 9 Discussion

Fault Models. Our current approach does not support advanced control faults such as instruction corruption or instruction skip. Instruction corruption is out of scope as it permanently changes an instruction, while we modify computation results. It is related to self-modification, a notoriously difficult point to address in adversarial binary-level code analysis [17,77]. Instruction skip (or other arbitrary control jumps) could be modeled by local modification of the program counter, yet at the price of a huge path explosion. Also, regarding micro-architectural attacks, modeling Spectre attacks is difficult due to the speculative windows mechanism and its associated rollback.

Other Formal Methods. While in the paper we focus on symbolic execution, we believe the main optimization ideas developed here can be used with other formal techniques, e.g. Bounded Model Checking [29,31], Abstract Interpretation [34] or CEGAR [30]. Note that for each of them, fault injection may result either in path explosion or precision loss. Still, our forkless encoding should be able to help at least all approaches based to some extent on path unrolling.

Other Properties. The forkless encoding can surely benefit other classes of properties to be achieved by the attacker, especially those known to be supported by (extensions of) symbolic execution, for example: trace properties such as use-after-free, k-hyperreachability properties (secret leakage, privacy leakage, violation of constant-time, etc.) [36], the recent robust reachability proposal [48] for replicable bugs, etc. Our formalism itself is quite generic and can accommodate a wide range of properties, as we mainly keep the property unchanged but modify the underlying transition system. We could for example imagine an attacker willing to activate a non-terminating execution (denial of service).

Forkless Encoding and Instrumentation. Several prior works use code-level instrumentation [68] or LLVM-level instrumentation [76,63,65] in order to leverage standard program analyzers as is. The forkless encoding we propose can also be used this way, for more flexibility but without additional optimizations. Actually, we performed some experiments with Klee and a C-level forkless instrumentation, and do observe significant improvement over forking instrumentation.

# 10 Related Work

SWiFI. Prior work in SWiFI has already been discussed in Section 3. All methods in this domain consider low-level formalism: C [28,68], LLVM [76,63], binary [25,15,20,50]. Half of the techniques rely on the mutant approach [28,79,49,25,50], and the other half relies on forking [76,15,20,63]. While most approaches target attack finding (with symbolic execution and bounded model-checking), some do aim at full verification [79], especially with deductive verification [68,28]. Very pose a static way of reducing injection points on C programs, that is complementary to our own method – still, static analysis at binary-level is known to be hard. Note that a few methods do consider instruction skips [49,20,50], yet with path explosion issues. few works consider multi-faults [76,63,68]. Interestingly, Lacombe et al. [63] pro-

Robustness Analysis. SWiFI is also used for robustness evaluation [64,74,56,88,65,72,32,90], in order to verify the correct behavior of error handling mechanisms. They rely also on forking or mutant techniques. The fault models are similar to hardware fault injection, yet multi-fault is not really an issue there, as faults are supposed to originate from safety issues (e.g. cosmic rays) and have no reason to accumulate unreasonably.

Formalizations and Fault Models. While it is common in the field of automated formal verification of cryptographic protocols to consider models of attackers (typically, extensions of the "Dolev-Yao" model) – either by specifying what the attackers can do [2] or what they cannot do [7], only very few formalizations of software-level attacker capabilities have been proposed so far. In software security, control-flow integrity attacks have been categorized by the capability an attacker needs [21], but these efforts have been restricted to manual reasoning. Interestingly, Given-Wilson et al. [51] propose a formalization of fault injection using Turing machines, but to our knowledge, no algorithm has been built for it. Also, Fournet et al. [46] propose a type system for program-level non-interference, taking into account an active adversary modeled as adversarial components able to perform any action at certain steps of the program.

Mutation Testing. Sometimes called software fault injection, mutation testing [75,33] aims to generate a comprehensive test suite by building test cases discriminating various mutants of a program, and is recognized as a very powerful testing criterion. As it focuses on coverage, mutant explosion cannot be avoided. Dedicated SE techniques [73,8,11,67] have been designed.

# 11 Conclusion

We formalize the concept of adversarial reachability, extending standard reachability to include the presence of an advanced attacker in program analysis, and we propose a dedicated symbolic algorithm for adversarial reachability, integrating a novel forkless encoding of faults together with dedicated optimizations. Our technique is shown to significantly reduce the number of paths to explore, and scales up to 10 faults on a standard SWiFI benchmark, where prior forking attempts timeout for 3 faults. Also, we show that our method scale to realistic size examples, such as the WooKey project where we have been able to replay known fault attacks and to even find a vulnerability not mentioned in a recently proposed countermeasure patch.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Automated Grading of Regular Expressions

Su-Hyeon Kim1,<sup>3</sup> , Youngwook Kim<sup>2</sup> , Yo-Sub Han<sup>2</sup> , Hyeonseung Im<sup>1</sup> , and Sang-Ki Ko1()

<sup>1</sup> Department of Computer Science & Engineering, Kangwon National University, Gangwon-do 24341, Republic of Korea {tngus98207,hsim,sangkiko}@kangwon.ac.kr

<sup>2</sup> Department of Computer Science, Yonsei University, Seoul 03722, Republic of Korea {youngwook,emmous}@yonsei.ac.kr

<sup>3</sup> Artificial Intelligence Research Center, Korea Electronics Technology Institute, Seongnam-si 13509, Republic of Korea suhyeon0123@keti.re.kr

Abstract. With the rapid transition to distance learning, automatic grading software becomes more important to both teachers and students. We study the problem of automatically grading the regular expressions submitted by students in courses related to automata and formal language theory. In order to utilize the semantic information of the regular expression, we define a declarative logic that can be described by regular language and at the same time has natural language characteristics, and use it for the following tasks: 1) to assign partial grades for incorrect regular expressions and 2) to provide helpful feedback to students to make them understand the reason for the grades and a way to revise the incorrect regular expressions into correct ones. We categorize the cases when students' incorrect submissions deserve partial grades and suggest how to assign appropriate grades for each of the cases. In order to optimize the runtime complexity of the algorithm, two heuristics based on automata theory are proposed and evaluated on the dataset collected from undergraduate students. In addition, we suggest Regex2NL which translates regular expressions to natural language descriptions to give insight to students so that they can understand how the regular expressions work.

Keywords: regular expressions · MSO logic · automated grading system · automata theory

# 1 Introduction

Regular expressions (regexes) are a great tool for the pattern matching problem as they can effectively describe pattern structures. Regexes are widely used in software applications such as search engines, text processing, programming languages, and compilers due to their compact representations. Although most developers find that regexes are powerful and flexible tools, they also feel that regexes are very difficult to learn for many reasons such as readability, validity, reliability, and so on [7,16].

There have been several interesting approaches to automatically grading student submissions in an automata-related course in the online education environment. Alur et al. [2] propose a technique for automatically grading students' DFA construction in automata courses while generating high-level hints for helping students understand how to correct their wrong submissions. For instance, they introduce the DFA edit difference to compute the amount of difference between the correct DFA and students' DFA and MOSEL (MSO-equivalent declarative logic) even to capture the case where the student's submission corresponds to a different logic in MOSEL. Later, D'Antoni et al. [6] utilize the DFA edit difference in order to generate natural language feedback explaining how to correct the submitted DFA. They also conduct an online survey to collect students' feedback about the quality, usability, and effectiveness of their grading system.

Kakkar [10] studies a similar problem, namely, the problem of grading regexes instead of DFAs. Inspired by the DFA edit difference [2], Kakkar proposes a new criterion called 'Regex Edit Distance' which is basically based on the string edit-distance between students' regexes and correct ones. However, both works suffer from a limitation that 'optimal' answers for the problems should be given by TAs as they compare the students' submissions with the answers for giving partial grades. Recently, D'Antoni et al. [5] propose Automata Tutor v3 (abbreviated to AT v3 hereafter), which is the latest version of the previous work [2]. In AT v3, they include automated grading and feedback generation for a variety of new automata problems including the problems that ask to create regexes, context-free grammars, pushdown automata, and even Turing machines for a given description (e.g., a natural language description, or an automaton, or a grammar that belongs to a different class). However, they also rely on the string edit-distance for grading regexes similar to the work of [10]. Note that AT v3 provides counterexamples of incorrect regexes such as strings that should (or not) be accepted by students as feedback.

In this paper, we introduce an automated grading framework for regular expressions that gives reasonable grades and helpful feedback. The overall structure of our regex grading scheme is illustrated in Fig. 1. As the regex construction problem's goal is to make regex from the natural language description, TA first assigns the problem by giving the natural description of the problem and the logic formula of the regex which is one of the forms of the regular language. Then students submit the regex corresponding to the given description. Finally, we use three algorithms for generating more convincing partial grades and feedback by comparing the answer logic formula with the submission.

We aim to overcome several remaining limitations that have not been resolved by the earlier approaches. First, we claim that it is not appropriate to grade a student's regex just by calculating the string edit-distance with the 'solution regex'. There could be infinitely many regexes that describe the same language. Even when we consider the set of most compact regexes describing the regular

Fig. 1. Overview of our automated regex grading framework

language in question, there can be multiple regexes since it is not guaranteed that there is a unique minimal regex for a given regular language. Also, the string edit-distance cannot take the structural similarity into account while we can obtain hierarchical information from the tree form of the regex. Second, we should consider not only the syntactic discrepancies but also the semantic discrepancies arising from the misinterpretation of the problem. In order to compare the logical differences in real-time, the regex must be transformed with the logic and converted to DFA in polynomial time. However, there is no compact logic to do so. Lastly, there is a lack of abundant feedback that helps students study regexes. More detailed feedback such as suggesting the shortest form of the regex, logical differences between the answer and the submission, and organized form of the corner case would be more helpful than simple symbol correction feedback.

In order to resolve the above-mentioned issues, we propose a 3-step regex grading scheme that considers both syntactic and semantic discrepancies between submitted regexes and answer logic formulas (natural language descriptions). More specifically, first, to consider the syntactic discrepancy, instead of comparing a student's regex with the solution regex, we compare the possible transforms of the student's regex with the language of the solution. To this end, we apply tree-level edits to the parse tree of the regex to detect the possible syntactic mistakes made by the student. As shown in Fig. 1, after one tree-edit with adding the star operator to student A's submission b+ab<sup>∗</sup>a, the edited regex is equivalent to TA's logic (b + ab<sup>∗</sup>a) ∗ . Second, we take into account the possibility that a student simply misinterprets the specification of the language. For instance, we may consider that a submitted regex deserves a partial grade if the language expressed by the submission corresponds to a specification that is very similar to the given specification. Therefore, we consider the semantic discrepancy by applying logic-level edits to the logic formula for the specification and searching

for a similar specification that exactly corresponds to the student's regex. In this way, by considering the 'similarity' to the student's regex, we can give a partial grade. For example, after one logic-edit with changing the parameter from 'a' to 'b' on the TA's logic, edited logic num\_div(b, 2, 0) is equivalent to the student B's submission (a + ba<sup>∗</sup> b) ∗ . Finally, we take some corner cases into accounts such as when the language of a submitted regex misses a reasonably small portion of the target language such as the empty string or a language consisting of a single symbol (a <sup>∗</sup> or b <sup>∗</sup> when <sup>Σ</sup> <sup>=</sup> {a, b}). For instance, we can find that (<sup>b</sup> <sup>∗</sup>ab∗ab<sup>∗</sup> ) ∗ cannot generate strings that have zero number of a's and at least one b while it generates the empty string. Moreover, we generate productive feedback for students using the byproduct of each partial grading algorithm so that they can understand what is wrong with the current submission and how to correct the submission into a correct regex.

The rest of the paper is organized as follows. Section 2 gives some definitions and notations. We introduce a set of declarative logic formulas for describing regular languages in Section 3 and our regex grading scheme in Section 4. The experimental results are provided in Section 5 and Section 6 concludes the paper.

# 2 Preliminaries

The size of a finite set S is denoted by |S|. Let Σ denote a finite alphabet and <sup>Σ</sup><sup>∗</sup> denote the set of all finite strings over <sup>Σ</sup>. For <sup>m</sup> <sup>∈</sup> <sup>N</sup>, <sup>Σ</sup><sup>≤</sup><sup>m</sup> is the set of strings of length at most m over Σ. A language over Σ is a subset of Σ<sup>∗</sup> . Given a set X, 2 <sup>X</sup> denotes the power set of X. The symbol λ denotes the empty string. We define mod(m, n) to be {<sup>k</sup> <sup>|</sup> <sup>k</sup> mod <sup>m</sup> <sup>=</sup> n, k <sup>∈</sup> <sup>N</sup>}. We also define ind(w, x) = {<sup>k</sup> <sup>|</sup> <sup>w</sup>[<sup>k</sup> : <sup>k</sup> <sup>+</sup> <sup>|</sup>x|] = x, k <sup>∈</sup> <sup>N</sup>}, where <sup>w</sup>[<sup>i</sup> : <sup>j</sup>] for <sup>i</sup> <sup>≤</sup> <sup>j</sup> denotes a substring of w concatenating characters of w from index i to j − 1, to be the set of indices where x appears in w. Note that the index starts from 1.

A regular expression (regex) over Σ is a ∈ Σ or the empty string λ, or is obtained by applying the following rules finitely many times. For regexes R<sup>1</sup> and <sup>R</sup>2, the union <sup>R</sup><sup>1</sup> <sup>+</sup> <sup>R</sup>2, the concatenation <sup>R</sup><sup>1</sup> · <sup>R</sup>2, and the Kleene-star <sup>R</sup><sup>∗</sup> <sup>1</sup> are also regexes.

Now we introduce a formal logic to be used to formally describe languages. Let w = w1w<sup>2</sup> · · · w<sup>n</sup> be a word over Σ. For any i ∈ [1, n] and a symbol a ∈ Σ, we say that a letter predicate a is true at i in w if w<sup>i</sup> = a. For example, the logic formula a(x) ∧ ∃y(y > x ∧ b(y)) means that 'there is a symbol a at the position x and a symbol b at the position later than x'. It is readily seen that the formula describes the language described by the following regex: a(a + b) ∗ b(a + b) ∗ . It is well-known that regular languages are expressible in monadic second-order (MSO) logic [4].

Given a regex R, we define the parse tree T(R) to be the rooted tree representing the hierarchical structure of R. Each leaf is labeled by a symbol in Σ ∪ {λ} and each internal node is labeled by n-ary operations such as · (concatenation) and + (union), or unary operation ∗ (Kleene-star). We define the regex tree edit-distance edrt(R, R<sup>0</sup> ) of two regexes R and R<sup>0</sup> to be the tree edit-distance between two parse trees of R and R<sup>0</sup> . Note that the tree edit-distance between

T(R) and T(R<sup>0</sup> ) is defined as the minimum number of edit-operations required to transform the tree T(R) into T(R<sup>0</sup> ), where an edit-operations for the regex tree edit-distance can be defined as a substitution of an operation symbol or a character from Σ into a different operation symbol (or a character from Σ), an insertion of a node, or a deletion of a node. It should be mentioned that we perform unordered matching between children of nodes labeled by the union + operator as the order of elements inside the union operator does not matter.

# 3 Simple Declarative Logic for Regular Languages

Since MSO logic formulas offer a relatively higher-level specification of regular languages than finite-state automata recognizing the languages, they can be used for describing regular languages in a human-readable format. Moreover, we can always compile an MSO logic formula for a regular language into a corresponding minimal DFA [12] and therefore, a regex as well.

As the transformation from MSO to DFA may require the size of the alphabet to grow exponentially in the number of nested quantifiers [8], we restrict our attention to the logic formulas that can describe all regular languages considered in famous automata textbooks without covering the whole regular languages while being able to be converted into a corresponding DFA in polynomial time. Table 2 shows the list of declarative logic formulas considered in this paper. Recall that MOSEL [2], an extension of MSO logic with some syntactic sugar to allow describing regular languages more concisely, is introduced for a similar reason. However, we claim that our logic formulas directly correspond to NL descriptions at a much higher-level and allow us to perform language equivalence tests in practical runtime.

Analogously to the parse tree of a regex, we define the parse tree T(φ) for a given logic formula φ. Here each leaf is labeled by an atomic formula and each internal node is labeled by unary logical connectives ¬ (negation) or n-ary logical connectives such as ∧ (conjunction) and ∨ (disjunction). Similarly to the regex tree edit-distance, we also define the logic tree edit-distance edlt(φ, φ˜) of two logic formulas φ and φ˜ as the unordered tree edit-distance between two parse trees of φ and φ˜. Note that we allow the substitution of an atomic logic formula and two logical connectives, conjunction, and disjunction, for the logic tree edit-distance. We also allow the insertion and deletion of negation. The substitution of an atomic logic formula is available for a single parameter such as strings x, y, non-negative integers m, n, and a comparison operator ∈ {>, <sup>=</sup>, <}. While the edit cost of the substitution of a logical connective equals 1, we assign the string edit-distance for the substitution of a string parameter, the numerical difference for an integer, and the value 1 for the substitution of a comparison operator.

We provide a list of regex problems and solutions collected from famous automata textbooks in Table 1. For each problem, we provide a natural language description for a regular language in question, a solution regular expression given in the textbook, and the corresponding logic formula found by us. We denote a + λ by a ? for brevity.


Table 1. A list of regex problems from famous automata textbooks.

# 4 Regex Grading Algorithm

In this section, we explain our automated regex grading algorithm by considering both syntactic and semantic properties.

#### 4.1 Grading of Regexes

Let us assume that exact logic formulas for regular languages asked in questions are already known as teachers always can specify the regular languages with

Table 2. A list of declarative logic formulas used to describe regular languages that appear in famous automata textbooks, where m, n ∈ N, a, b ∈ Σ, x, y ∈ Σ ∗ , and ∈ {>, =, <}. In the set notation, we broadcast +n and −n for some integer n to each element of the given set.


Table 3. Examples of incorrect regexes for 'Even number of a's', which has a possible solution (b + ab<sup>∗</sup> a) ∗ .


the provided logic formulas in Table 2. We aim at grading the submitted regex in terms of two types of syntactic correctness and a set of counterexamples as follows:

Syntactic grading Recall that previous approaches to computing the syntactic similarity or dissimilarity between two regexes rely on string edit-distance between two regexes. However, the string edit-distance between two regexes does not take the structural similarity into account. We instead use the tree edit-distance between two parse trees of regexes as the tree edit-distance better reflects the structural similarity of regexes. One of the advantages of using the tree editdistance is that we can also easily identify semantically equivalent regexes when they are viewed as parse trees rather than as strings.

Then, we define the syntactic grade of R based on the minimum tree editdistance between R and an unknown regex R˜ such that L(R˜) = L(φ). Formally speaking, the syntactic grade of R is defined as follows:

$$G\_{\rm syn} = G\_{\rm full} - w\_{\rm syn}(R) \cdot \min \{ \text{ed}\_{\rm rt}(R, \tilde{R}) \mid L(\tilde{R}) = L(\phi) \}, \tag{1}$$

where Gfull means the full grade (10 in our implementation). The function wsyn scales the deduct points based on the length of the submitted regex R because if R is very long and it requires a single edit, then we may consider that R is syntactically similar enough to a solution.

Let us explain the detailed procedure for computing Gsyn. We first parse the regex <sup>R</sup> as a binary tree and construct the set <sup>S</sup>R,n <sup>=</sup> {R˜ <sup>|</sup> edrt(R, <sup>R</sup>˜) <sup>≤</sup> <sup>n</sup>} of regexes where each regex is within the tree edit-distance n (n = 2 in our experiments). Note that we use tree edit-distance instead of string edit-distance used in AT v3 and RegED as the tree edit-distance makes more sense to compute the syntactic difference between two regexes. For instance, the tree edit-distance between a + b and (b + a) ∗ is one while the string edit-distance is five.

For running the above procedure more efficiently, we increment the value of n from zero by one at each iteration until we find such R˜. We also check whether or not the current regex is already examined in the previous iteration by comparing the parse trees of regexes so that our implementation can avoid redundant regex equivalence tests.

Logical grading Given a problem 'A regex for strings where the string aba appears at 3th position.', a student may submit an incorrect solution (a + b)aba(a + b) ∗ by making a mistake of reading the number '3' as '2'. Because the most plausible answer is (a + b)(a + b)aba(a + b) ∗ , the student's submission is likely to receive no partial grade according to the syntactic grading, which could be a harsh decision for an elementary mistake. However, if we semantically compare the submission and the problem, there is a hope to receive a partial grade as they turn out to be very similar in terms of corresponding logic formulas pos(aba, 2) and pos(aba, 3).

The main challenge in logical grading is to find a logic formula that corresponds to the submitted regex such that we can effectively quantify the amount of semantic discrepancy between the submitted regex and the problem. Given a regex, it requires a considerable amount of computation for finding a logic formula described as a logical combination of formulas provided in Table 2, assuming that the only feasible approach is an exhaustive tree search. Even worse, it is not always possible to find such a corresponding logic as the provided set of

logic formulas cannot cover the entire class of regular languages. In order to save computation time, we instead utilize the solution logic formula by applying tree-level edits to the parse tree of the solution logic formula at most n times (again, n = 2 in our implementation) and checking whether the edited formula is language-equivalent to the submitted regex.

If we manage to find a logic formula φ˜ that corresponds to the submitted regex, then the logical grade of R is then computed as follows:

$$G\_{\log} = G\_{\text{full}} - w\_{\text{log}}(\phi) \cdot \min \{ \text{ed}\_{\text{lt}}(\phi, \tilde{\phi}) \mid L(\tilde{\phi}) = L(R) \}. \tag{2}$$

Corner case grading In some cases, the submitted regex may describe a very similar language to the language in question although the regex is syntactically different (e.g., tree edit-distance is larger than n). For instance, let us consider a problem with the following description: "Strings with even number of a's." provided in Table 3. The language described by a regex (b <sup>∗</sup>ab<sup>∗</sup>ab<sup>∗</sup> ) ∗ is quite similar to the described language except for strings only with b's. In order to check whether the submitted regex deserves a corner case partial grade, we construct two DFAs for the following languages: L(R) ∩ L(φ) and L(R) ∩ L(φ). The language L(R) ∩ L(φ) is the set of strings that can be described by R and not by φ (false positive examples). On the contrary, L(R) ∩ L(φ) captures the set of strings that are described by φ but not by R (false negative examples). We enumerate the strings from both DFAs by using the enumDFA function in FAdo library in lexicographical order and display them to users to make them understand why their submissions are not correct by counterexamples.

We also assign a corner case grade Gcor = 4 <sup>5</sup> × Gfull if false positive and false negative sets satisfy one of the following conditions::


#### 4.2 State Complexity of Logic Formula's DFAs

It is easy to see that all atomic logic formulas presented in Table 2 can be represented by DFAs of size linear in the lengths of string parameters. In the following proof, m, n <sup>∈</sup> <sup>N</sup>, a, b <sup>∈</sup> <sup>Σ</sup>, x, y <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> , and ∈ {>, <sup>=</sup>, <}.

Proposition 1. For each atomic logic formula φ in Table 2, we can construct a DFA recognizing L(φ) with a polynomial number of states in |x| and |y|.

While most of the formulas in Table 2 can be represented as DFAs of size linear in the numerical parameters m and n as well, there are two exceptions: 'pos\_rev(x, n)' and 'pos\_every\_rev(x, m, n)'.

Proposition 2. For each atomic logic formula φ in Table 2 except pos\_rev(x, n) and pos\_every\_rev(x, m, n), we can construct a DFA recognizing L(φ) with a polynomial number of states in m and n.

Fig. 2. An NFA for pos\_rev(a, n).

Unlike the other formulas, the state complexity of pos\_rev(x, n) and pos\_every\_rev(x, m, n) is exponential in n in the worst case.

Lemma 1. The state complexity of pos\_rev(x, n) is exponential in n.

Proof. Since the NFA construction for pos\_rev(x, n) requires |x| + n + 1 states, we have a simple upper bound 2 <sup>|</sup>x|+n+1 which is exponential in n for the state complexity of pos\_rev(x, n).

The simplest example where the lower bound is also exponential in n is when x is a string of length one such as a or b. See Fig. 2 for an NFA accepting the regular language pos\_reverse(a, n). Since the initial state q<sup>0</sup> has a self-loop labeled by Σ, it is easy to see that the upper bound of the state complexity is 2 n as q<sup>0</sup> is always in the state set in the subset construction.

Now we will show that the upper bound 2 <sup>n</sup> can be reached by describing how we can reach any subset of states from 2 {q1,q2,...,qn+1} . Let us consider a state set P = {qs<sup>1</sup> , qs<sup>2</sup> , . . . , qs<sup>k</sup> }, where s<sup>i</sup> < s<sup>j</sup> for 1 ≤ i < j ≤ k ≤ n + 1. Then, we can reach P by reading the following string:

$$ab^{s\_k - s\_{k-1} - 1}ab^{s\_{k-1} - s\_{k-2} - 1} \cdots ab^{s\_1 - 1} \cdots$$

Since it is easy to see that all states in 2 {q1,q2,...,qn+1} are pairwise distinguishable, we conclude that the state complexity of pos\_rev(a, n) is 2 n.

Now the following state complexity is obvious from the above observation.

Proposition 3. The state complexity of pos\_every\_rev(x, m, n) is exponential in n.

#### 4.3 Heuristics for Faster Computation

In order to avoid this exponential blow-up in the size of DFAs, we employ the following two heuristics for faster computation of grades.

Regex reverse trick Interestingly, we can avoid this exponential blow-up caused by pos\_rev(x, n) by reversing the given regex and the logic formula at the same time. We can trivially reverse the regex while maintaining the length and construct polynomial-sized DFAs for all reversed logic formulas except pos(x, n).

For instance, suppose that we are given a regex R and a declarative logic formula φ as follows:

$$\begin{aligned} R &= a(a+b)b^\*b \text{ and} \\ \phi &= \text{pos\\_rev}(b,n) \land \text{len}(>,3) \land \text{num}(a,>,1). \end{aligned}$$

In order to avoid the exponential blow-up by pos\_rev(x, n), we reverse R and φ as follows:

$$\begin{aligned} R' &= bb^\*(a+b)a \text{ and} \\ \phi' &= \text{pos}(b,n) \land \text{len}(>,3) \land \text{num}(a,>,1). \end{aligned}$$

Note that the logic such as len(, n) and len(x, , n) are reversal-invariant.

Concise Normal Form Recall that we construct a set of regexes from a submitted regex R by applying parse tree level edits for computing the syntactic grade. The main computational bottleneck comes from the repetitive regex equivalence tests as there are too many regexes in the set. In order to reduce the size of the constructed set, we employ the concise normal form [11] of regexes which are proven to be useful to sufficiently reduce the number of redundant regexes. For instance, we inductively apply substitution rules for subregexes such as <sup>R</sup><sup>∗</sup><sup>R</sup> <sup>→</sup> RR<sup>∗</sup> , <sup>R</sup><sup>∗</sup>R<sup>∗</sup> <sup>→</sup> <sup>R</sup><sup>∗</sup> , <sup>R</sup> <sup>+</sup> <sup>R</sup><sup>∗</sup> <sup>→</sup> <sup>R</sup><sup>∗</sup> , (R<sup>∗</sup> ) <sup>∗</sup> <sup>→</sup> <sup>R</sup><sup>∗</sup> for concise regex representation and pruning of redundant regexes.

#### 4.4 Description of Regex Grading Algorithm

Algorithm 1 precisely describes the whole procedure for computing the final grade of a student's regex R for a problem corresponding to a declarative logic formula φ. First, we preprocess the given student's regex R and declarative logic formula using the normal form and reverse trick for faster computation and convert them into the DFAs for partial grading. If the submission is equivalent to the solution, then give 10 points. If not, give the highest point among the three partial grades.

#### 4.5 Converting Regex to NL Description

Many researchers have studied the problem of translating an NL description into a corresponding regex [13,15,17]. Here we examine a dual problem, namely, the problem of converting a regex into an NL description (Regex2NL) to help regex learners easily understand the language accepted by the given regex. Consider (b + ab<sup>∗</sup>a) ∗ for an example again. Instead of merely translating the semantics of regex operators and symbols, our goal is to generate an 'easy-to-understand' NL description such as 'even number of a's' which corresponds to a logic formula defined in Table 1.

Our approach involves two steps, where we first find a logic formula corresponding to the regex and then translate the logic formula into an NL description

#### Algorithm 1: Our Regex Grading Algorithm

Input : A student's regex R and a declarative logic formula φ Output : A grade, feedback of R for the problem specified by φ, and a set of counter-examples Convert R into R <sup>0</sup> which is in a regex normal form; if φ contains pos\_reverse(x, n) and not pos(x, n) then Reverse R 0 and φ; Construct a DFA AR<sup>0</sup> for R 0 and a DFA A<sup>φ</sup> for φ; if L(AR<sup>0</sup> ) = L(Aφ) then if |R 0 | < |R| then return 10, 'R can be written in more compact form such as R 0 ', ∅; else return 10, 'Well constructed', ∅ else Compute (Gsyn, R˜) and (Glog, φ˜) of R; Generate a set S of random strings from L(φ) ∩ L(R) c ; if Gsyn > Glog then return Gsyn, 'R should include ... to be the R˜', S; else return Glog, 'R accepts a language specified by φ˜', S;

by rules. It is worth noting again that there are regexes that cannot be effectively described by our logic. Therefore, it is not always possible to find a corresponding logic from a given regex even if we enumerate all logic formulas. Even if there exists a corresponding logic for the given regex, it takes too much time (more than one minute in general) for practical use in most cases. Hence we propose to use a deep learning-based approach that can predict a logic formula from a given regex with reasonably high accuracy in practical runtime (less than one second).

First, we train the Regex2Logic model that translates a regex to a logic formula using a sequence-to-sequence neural network with attention mechanism [3]. For training our Regex2Logic model, we use a dataset consisting of 13,437 pairs of regexes and logic formulas that are collected by time-consuming enumerations of regexes and logic formulas, and regex templates. We construct the regex-logic pair dataset for training our Regex2NL model which translates a given regex into a logic formula defined by using our simple declarative logic formulas. We collect the pairs by time-consuming enumerations of regexes and logic formulas and regex templates. We split the pairs into the ratio of 8:1:1 for training, validation, and test sets. We explain each process in more detail as follows:

1. Regex enumeration: enumerate regexes from the simplest one to more complex ones by increasing the depth of parse trees of regexes and searching for corresponding logic formulas until pre-defined thresholds (two for the depth, three for the length of argument strings and integers) for the complexity of logic formulas are reached.


Table 4. Statistics of the constructed regex-logic pair dataset used to train our Regex2NL model. φ, φ1, and φ<sup>2</sup> denote atomic logic formulas found by enumerations of regexes and logic formulas or regex templates.


Table 4 shows the statistics of our dataset, especially in terms of the distribution of logic formulas used. The conjunction or disjunction of the same logic formulas is counted as a conjunction or disjunction.

In order to construct a set of regex-logic pairs, we can manually define a regex in a generalized form for each logic formula with arbitrary arguments. We rely on the following list of regex templates for generating various regexes by changing arguments of the templates:

$$\begin{array}{l} -\ \operatorname{pos}(x,n) : \sigma^{(n-1)}x\sigma^{\*} \\\ -\ \operatorname{pos\\_rev}(x,n) : \sigma^{\*}x^{R}\sigma^{(n-1)} \\\ -\ \operatorname{len}(=,n) : \sigma^{n} \\\ -\ \operatorname{len}(<,n) : (\sigma+\lambda)^{n-1} \\\ -\ \operatorname{len}(<,n) : \sigma^{\*}+\sigma^{2}+\sigma^{3}+...+\sigma^{n-1} \\\ -\ \operatorname{len}(>,n) : \sigma^{n+1}\sigma^{\*} \\\ -\ \operatorname{len}\\_\operatorname{div}(x,m,n) : \sigma^{n}(\sigma^{m})^{\*} \\\ -\ \operatorname{len}\\_\operatorname{div}(x,m,n) : (\sigma^{m})^{\*}\sigma^{n} \\\ -\ \operatorname{len}\\_\operatorname{div}(x,m,n) : (\sigma^{m})^{\*}\sigma^{n} \end{array}$$

By applying enumerated strings and integers as arguments, we can collect many regex-logic pairs. Once we discover the initial set of regex-logic pairs, we augment the data by combining the regexes and logic formulas with a regex operator + and a logical connective ∨, respectively.

Note that our Regex2NL achieves about 92.3% prediction accuracy for the test set. For 167 incorrect regex submissions from students, our logical grading module finds 21 logic formulas that are within logic tree edit-distance two from the solution logic formula. Among the remaining 146 regexes, our model predicts 39 logic formulas that actually correspond to given regexes. We can provide natural language descriptions for 35.9% of the incorrect submissions from the logical grading module and the Regex2Logic model. We believe it is very useful to provide 'easy-to-understand' NL descriptions on 35.9% of submissions using our Regex2NL model, while most regexes do not have corresponding logic formulas definable by the proposed set of simple declarative logic formulas as we already discussed.

Then, we can transform the logic formula given by Regex2Logic to the natural language description with the heuristic template. We can make a template easily, as the logic formula has the characteristic of the natural language. We can use the entire framework of Regex2NL not only for feedback on incorrect submissions but also for making the random regex problem. For example, we can make the random regex first with regex enumeration of the regex template, then we can translate the regex to the natural language description. We can make the pair of regex-NL for using the regex problem.

#### 4.6 Feedback Generation

There are natural types of feedback such as binary feedback (correct/wrong), an example, and a natural language-based conceptual hint. Binary feedback is the simplest yet necessary feedback that should be provided to students who submitted regexes. We can also simply generate a counterexample if the submitted regex is not correct. We focus on generating a natural language-based conceptual hint that describes the discrepancy between the desired solution and the submitted solution in an easily understandable manner.

When the submitted regex is not correct, there can be two cases as follows. First, the submitted regex should be slightly revised in order to accept the desired language. In this case, the most desirable feedback may be the way to revise the submitted regex. Second, the submitted regex accepts a semantically different language than the desired language as the student may have misinterpreted the question. Then, we may need to inform the student about the semantic discrepancy between the language described by the submitted regex and the desired regular language in an easily understandable manner.

For the first case, we provide the regex edit sequence between the submitted regex R and a regex R<sup>0</sup> which is syntactically closest (with the smallest regex edit-distance) to R while accepting the regular language specified in the problem. For the second case, we suggest the logic edit sequence between the logic formula φ corresponding to R and a logic formula φ˜ specified in the problem. If the problem

asks a regular language "strings containing a substring abab at least once" which corresponds to num(abab, >, 0) and the submitted regex captures a regular language corresponding to num(ab, =, 0), then we provide the following feedback: "Consider substring abab instead of ab and operator > instead of =."

# 4.7 Converting Logic Formulas to NL Descriptions

Table 5 shows the NL descriptions for each atomic logic formula used in the rulebased translation of logic formulas into NL descriptions. When a logic formula is formed by combining more than two atomic formulas φ<sup>1</sup> and φ<sup>2</sup> using logical connectives, we simply combine the corresponding NL descriptions. For example, let NL(φ) be the NL description of an atomic logic formula φ following the rules in Table 5. Then, NL(φ<sup>1</sup> ∧ φ2) is defined as 'The set of strings that satisfy the following conditions: 'NL(φ1)' and 'NL(φ2)'.

Using this, we present regexes in more concise form even when the submitted regex is correct. Let us consider the problem 'all runs of a's have lengths that are multiples of three'. Note that a regex (aaa + b) ∗ can be a solution. If a student submits (aaa + b ∗ ) <sup>∗</sup> + b <sup>∗</sup> as a solution, then the system should give the full grade since the submitted regex recognizes the desired regular language. While assigning a full grade to the submission, our algorithm provides (aaa + b) ∗ to the student by computing the concise normal form [11] of the submission so that the student can recognize that there is a better solution (in terms of syntactic conciseness).

# 5 Experiments

We recruited 20 undergraduate students who were taking or had taken an automata course at the time of conducting our research, and ran our automatic grading algorithm on students' regex submissions for ten selected exercises from famous automata textbooks [9,14,18]. In order to compare the results of automated grading with the previous approaches including RegED [10] and AT v3 [5], we implemented the algorithms in Python 3 on our own and used them for comparison. We cannot use the existing implementations directly, because they do not support a feature of adjusting the maximum number of allowed edits, and not all of them are supported as a tool. We utilized the Python 3 port<sup>4</sup> of the FAdo [1] package, which is an open-source library for the symbolic manipulation of automata and other computation models. We also restricted the number of edits allowed for partial grades to two in our algorithm and AT v3, and one in RegED since RegED applies edits from both solutions and submissions.

#### 5.1 Main Results

Table 6 shows the experimental results in terms of the statistics of grading results. We present the ratio of submissions that received partial grades by the considered

<sup>4</sup> https://github.com/0xnurl/fado-python3


Table 5. Natural language descriptions of our declarative logic formulas.

grading algorithms in 'Partial Total' column. The 'Partial Gsyn' column shows the ratio of regexes that received a partial 'syntactic grade' by AT v3, RegEd, and


Table 6. Performance comparisons of the proposed grading algorithm with baseline algorithms proposed in previous works [5,10].

our syntactic grading algorithms over all regexes. Since AT v3 and RegED only consider syntactic grading, values in this column show the ratio of regexes that received partial grades over all regexes. On the other hand, 'Partial Glog' column shows the ratio of regexes that received a partial 'logical grade' by our algorithm over all regexes. It is seen that AT v3 and RegED fail to assign partial grades to some regexes as they only consider syntactic differences with solution regexes, not the logic formulas behind the problem descriptions. Note that higher partial grades do not always mean that the grades are 'well-deserved'. It is important whether the partial grade is convincing. We will explain in the following section why RegED gives more partial grades than ours and why giving more partial grades cannot be a good choice.

To put it briefly, RegED gives partial grades to more regexes (45.3%) than AT v3 (30.2%) and even ours (40.7%). Table 7 shows several examples of the grades and feedback examples for students' submissions to the five problems in Table 1.

#### 5.2 Validity of Grading Results

In order to verify that our algorithm indeed assigns partial grades to submissions that are 'well-deserved', we provide several reasons.

First, we can find logical partial grades while AT v3 and RegED cannot. We demonstrate two examples for the case. For the problem with the following description 'even number of a's', our algorithm assigns a partial grade to the submission (a + ba<sup>∗</sup> b) <sup>∗</sup> while there is a possible solution (b + ab<sup>∗</sup>a) ∗ . Our logical grading module gives a partial grade, as it is possible that the student makes a simple mistake of confusing a with b. For the problem 'contains at most three a's', our algorithm assigns a partial grade to b ∗ (a + λ)b ∗ (a + λ)b ∗ (a + λ)b ∗ (a + λ)b ∗ while one of the possible solutions is b ∗ (a + λ)b ∗ (a + λ)b ∗ (a + λ)b ∗ . This is again possible due to our logical grading module, as the student could have confused numbers.

Second, our syntactic grading gives some partial grades with tree-edit while others cannot. For example, our syntactic grading gives a partial grade to (b ∗a ∗ )abab(b ∗a ∗ ) for the problem 'contains the substring abab' as we may insert two star operators for the occurrences of (b ∗a ∗ ). However, RegED and AT


Table 7. Grading and feedback examples generated by our regex grading algorithm for problems in Table 1. We denote a + b by σ for brevity.

v3 will not assign a partial grade if they are provided (a + b) <sup>∗</sup>abab(a + b) <sup>∗</sup> and (b + a) <sup>∗</sup>abab(b + a) <sup>∗</sup> as possible solutions while our algorithm uses logic as a solution. This is because RegED utilizes only one solution regex for comparing with the submitted regex and it allows edits from both the solution and the submitted regex. RegED performs an edit at solution regex and submitted regex, respectively, to improve speed, but if solution regex is not given in an ideal form as in the above example, RegED cannot grade properly. To solve this problem, all possible variants of solution regex must be considered for editing and comparing and this leads to significant time-consuming. We can compare with every possible candidate without additional time, as our regex grading uses logic for the solution and permits the edit only in submission regex.

Third, the string edit used by RegED tends to cover too many candidates rather than our tree edit. For instance, it can change a+b+c to a ∗ b+c and aab+c with a single edit. This may differ depending on the TA's point of view, but we believe that the edit should be conducted more strictly due to the perspective of the tree structure, the original property of regex. Since given edits are more fluid than the tree edit, it allows more areas to be covered by edit, which is not considered the intended edit, suggesting that giving a lot of partial grading is not always the right direction. Assigning higher partial grades is not always the right direction, as it often jumps ahead of what we intended.


Table 8. Evaluation for the similarity with TA partial grades.

#### 5.3 Comparison with TA Partial Grade

Table 8 demonstrates how the grading results by the algorithms align well with the human TAs' grading results. We ask five human TAs to give grades to 167 incorrect regex submissions by students. First, we calculate the precision, recall, and F1 score for each algorithm and for each TA. Precision is the percent of partial grades by the algorithm that matches the TA and recall is the percent of TA partial grades that the algorithm agrees with. Then we get an average score comparing the grading results with each result of human TAs. Since correct submissions should always receive full marks, we only consider incorrect submissions and check whether or not human TAs gave partial grades to the submissions. In other words, we assume that human TAs always make the right decisions in terms of giving partial grades to incorrect submissions and consider the cases where the partial grades are given as positive cases. We can see that the results in the 'Precision' column imply how the algorithms 'carefully' select submissions that deserve partial grades and the 'Recall' column show that the algorithms do not miss such cases.

Overall, our grading algorithm shows the best performance in terms of the F1 score, which is the harmonic mean of precision and recall. Then, RegED is places in the second position with a tiny gap between our algorithm and AT v3 following it.

Intuitively, it is natural that the recall is highest in RegED as RegED covers more regexes than the other compared algorithms. We can also see from the high precision of the logical grading module that the partial grade submissions captured by the logical grading module are quite precise even compared with the other modules used in our algorithm. However, the logical grading fails to capture the regexes that received partial grades by TAs from the other algorithms. On the other hand, the syntactic grading can capture much more regexes that received partial grades by TAs than the other modules in our algorithm. This also shows that human TAs tend to give partial grades to submissions with syntactic mistakes rather than to submissions with logical mistakes.

Fig. 3. Runtime comparison w/wo reverse trick. s<sup>n</sup> and c<sup>n</sup> indicate problems corresponding to logic formulas pos\_rev(a, n) and pos\_rev(a, n) ∧ num(bba, >, 0), respectively.

#### 5.4 Effectiveness of the Regex Reverse Trick

We demonstrate the effectiveness of the reverse trick in terms of runtime complexity reduction of the proposed algorithm in Fig. 3. There is no noticeable difference in short regexes. However, we can find that the time increases to log scale as the length of the regex increases.

#### 5.5 User Study

In Fig. 4, we provide a screenshot of a web page for the online 'Regex Trainer' in which our regex grading algorithm is employed. In the online Regex Trainer page, the system displays each regex construction problem in turn to a student. If the student inputs his/her answer for the problem, then the system shows the grade with feedback and displays the next problem.

We conducted a user study by asking five questions to nine students who performed tests on the usability and usefulness of our regex grading algorithm. The result is shown in Table 9. Each student is asked to give their answer to each question on a Likert scale from 1 (strongly disagree) to 5 (strongly agree). The result shows that average scores for the five questions are all in the range of [3.7, 4.4], which implies that the students in general find our grading system easy-to-use and useful for studying regexes.

#### 5.6 Limitations

In the following, we leave a list of limitations of our study. First, the proposed set of logic formulas cannot express the entire class of regular languages. In future work, we may extend the set of formulas by adding useful logic formulas that

Fig. 4. A screenshot taken from the web page of online 'Regex Trainer' where our automatic grading module is used inside.

Table 9. Student survey result. Nine students gave their judgments for the following five questions on a Likert scale from 1 to 5.


are suitable for potential regex construction problems. Second, there could be another approach to catch student's 'mistakes'. We suggest three partial grades that catch syntactic, logical, and corner case mistakes. Finding a new cause of mistakes can provide richer and more detailed feedback for students. Moreover, it is very likely that our grading algorithm takes too much time if the submitted regex is unnecessarily long since in this case the number of regexes that should be examined would increase exponentially.

# 6 Conclusions

Due to the transition from face-to-face teaching to online, distance learning, the importance of developing an automated grading system has become more evident. We have presented an efficient and powerful automated grading algorithm for regexes in undergraduate automata and formal language courses. Our algorithm takes students' regex submissions and assigns appropriate grades with productive feedback to the regexes by considering the syntactic and semantic alignment

between the submitted regexes and the problem definition. Moreover, by employing several heuristics such as the reverse trick and intermediate regex simplification, we could have reduced the runtime complexity for the repetitive regex equivalence tests for grading regexes.

# Acknowledgments

We thank the reviewers for their valuable comments and suggestions for improving the presentation of the paper. This research was supported by the NRF grant (No. 2020R1A4A3079947), the IITP grant (No. 2022-0-00320), and the AI Graduate School Program (No. 2020-0-01361) funded by the Korea government (MSIT).

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Builtin Types Viewed as Inductive Families

Guillaume Allais()

University of St Andrews, St Andrews, UK guillaume.allais@ens-lyon.org

Abstract. State of the art optimisation passes for dependently typed languages can help erase the redundant information typical of invariantrich data structures and programs. These automated processes do not dramatically change the structure of the data, even though more efcient representations could be available.

Using Quantitative Type Theory as implemented in Idris 2, we demonstrate how to defne an invariant-rich, typechecking-time data structure packing an efcient runtime representation together with runtime irrelevant invariants. The compiler can then aggressively erase all such invariants during compilation.

Unlike other approaches, the complexity of the resulting representation is entirely predictable, we do not require both representations to have the same structure, and yet we are able to seamlessly program as if we were using the high-level structure.

Keywords: Quantitative Type Theory · Indexed families · Runtime representation · Idris 2

# 1 Introduction

Dependently typed languages have empowered users to precisely describe their domain of discourse by using inductive families [13]. Programmers can bake crucial invariants directly into their defnitions thus refning both their functions' inputs and outputs. The constrained inputs allow them to only consider the relevant cases during pattern matching, while the refned outputs guarantee that client code can safely rely on the invariants being maintained. This programming style is dubbed 'correct by construction'.

However, relying on inductive families can have a non-negligible runtime cost if the host language is compiling them naïvely. And even state of the art optimisation passes for dependently typed languages cannot make miracles: if the source code is not efcient, the executable will not be either.

A state of the art compiler will for instance successfully compile lengthindexed lists to mere lists thus reducing the space complexity from quadratic to linear in the size of the list. But, confronted with a list of booleans whose length is statically known to be less than 64, it will fail to pack it into a single machine word thus spending linear space when constant would have sufced.

In section 2, we will look at an optimisation example that highlights both the strengths and the limitations of the current state of the art when it comes to removing the runtime overheads potentially incurred by using inductive families.

In section 3 we will give a quick introduction to Quantitative Type Theory, the expressive language that grants programmers the ability to have both strong invariants and, reliably, a very efcient runtime representation.

In section 4 we will look at an inductive family that we use in a performancecritical way in the TypOS project [2] and whose compilation sufers from the limitations highlighted in section 2. Our current and unsatisfactory approach is to rely on the safe and convenient inductive family when experimenting in Agda and then replace it with an unsafe but vastly more efcient representation in our actual Haskell implementation.

Finally in section 5, we will study the actual implementation of our efcient and invariant-rich solution implemented in Idris 2. We will also demonstrate that we can recover almost all the conveniences of programming with inductive families thanks to smart constructors and views.

# 2 An Optimisation Example

The prototypical examples of the naïve compilation of inductive families being inefcient are probably the types of vectors (Vect) and fnite numbers (Fin). Their interplay is demonstrated by the lookup function. Let us study this example and how successive optimisation passes can, in this instance, get rid of the overhead introduced by using indexed families over plain data.

A vector is a length-indexed list. The type Vect is parameterised by the type of values it stores and indexed over a natural number corresponding to its length. More concretely, its Nil constructor builds an empty vector of size Z (i.e. zero), and its (::) (pronounced 'cons') constructor combines a value of type a (the head) and a subvector of size n (the tail) to build a vector of size (S n) (i.e. successor of n).

```
data Vect : Nat -> Type -> Type where
  Nil : Vect Z a
  (::) : a -> Vect n a -> Vect (S n) a
```
The size n is not explicitly bound in the type of (::). In Idris 2, this means that it is automatically generalised over in a prenex manner reminiscent of the handling of free type variables in languages in the ML family. This makes it an implicit argument of the constructor. Consequently, given that Nat is a type of unary natural numbers, a naïve runtime representation of a (Vect n a) would have a size quadratic in n. A smarter representation with perfect sharing would still represent quite an overhead as observed by Brady, McBride, and McKinna [6].

A fnite number is a number known to be strictly smaller than a given natural number. The type Fin is indexed by said bound. Its Z constructor models 0 and is bound by any non-zero bound, and its S constructor takes a number bound by n and returns its successor, bound by (1 + n). A naïve compilation would here also lead to a runtime representation sufering from a quadratic blowup.

```
data Fin : Nat -> Type where
  Z : Fin (S n)
  S : Fin n -> Fin (S n)
```
This leads us to the defnition of the lookup function. Provided a vector of size n and a fnite number k bound by this same n, we can defne a total function looking up the value stored at position k in the vector. It is guaranteed to return a value. Note that we do not need to consider the case of the empty vector in the pattern matching clauses as all of the return types of the Fin constructors force the index to be non-zero and, because the vector and the fnite number talk about the same n, having an empty vector would automatically imply having a value of type (Fin 0) which is self-evidently impossible.

```
lookup : Vect n a -> Fin n -> a
lookup (x :: _) Z = x
lookup (_ :: xs) (S k) = lookup xs k
```
Thanks to our indexed family, we have gained the ability to defne a function that cannot possibly fail, as well as the ability to only talk about the pattern matching clauses that make sense. This seemed to be at the cost of efciency but luckily for us there has already been extensive work on erasure to automatically detect redundant data [6] or data that will not be used at runtime [22].

#### 2.1 Optimising **Vect**, **Fin**, and **lookup**

An analysis in the style of Brady, McBride, and McKinna's [6] can solve the quadratic blowup highlighted above by observing that the natural number a vector is indexed by is entirely determined by the spine of the vector. In particular, the length of the tail does not need to be stored as part of the constructor: it can be reconstructed as the predecessor of the length of the overall vector. As a consequence, a vector can be adequately represented at runtime by a pair of a natural number and a list. Similarly a bounded number can be adequately represented by a pair of natural numbers. Putting all of this together and remembering that the vector and the fnite number share the same n, lookup can be compiled to a function taking two natural numbers and a list. In Idris 2 we would write the optimised lookup as follows (we use the **partial** keyword because this transformed version is not total at that type).

**partial** lookup **: (**n **:** Nat**) ->** List a **->** Nat **->** a lookup **(**S n**) (**x :: **\_)** Z **=** x lookup **(**S n**) (\_** :: xs**) (**S k**) =** lookup n xs k

We can see in the second clause that the recursive call is performed on the tail of the list (formerly vector) and so the frst argument to lookup corresponding to the vector's size is decreased by one. The invariant, despite not being explicit anymore, is maintained.

A Tejiščák-style analysis [22] can additionally notice that the lookup function does not use the bound's value and drop it. This leads to the lookup function on vectors being compiled to its partial-looking counterpart acting on lists.

```
partial
lookup : List a -> Nat -> a
lookup (x :: _) Z = x
lookup (_ :: xs) (S k) = lookup xs k
```
Even though this is in our opinion a pretty compelling example of erasing away the apparent complexity introduced by inductive families, this approach has two drawbacks.

Firstly, it relies on the fact that the compiler can and will automatically perform these optimisations. But nothing in the type system prevents users from inadvertently using a value they thought would get erased, thus preventing the Tejiščák-style optimisation from fring. In performance-critical settings, users may rather want to state their intent explicitly and be kept to their word by the compiler in exchange for predictable and guaranteed optimisations.

Secondly, this approach is intrinsically limited to transformations that preserve the type's overall structure: the runtime data structures are simpler but very similar still. We cannot expect much better than that. It is so far unrealistic to expect e.g. a change of representation to use a balanced binary tree instead of a list in order to get logarithmic lookups rather than linear ones.

#### 2.2 No Magic Solution

Even if we are able to obtain a more compact representation of the inductive family at runtime through enough erasure, this does not guarantee runtime efciency. As the Coq manual [11] reminds its users, extraction does not magically optimise away a user-defned quadratic multiplication algorithm when extracting unary natural numbers to an efcient machine representation. In a pragmatic move, Coq, Agda, and Idris 2 all have ad-hoc rules to replace convenient but ineffciently implemented numeric functions with asymptotically faster counterparts in the target language.

However this approach is not scalable: if we may be willing to extend our trusted core to a high quality library for unbounded integers, we do not want to replace our code only proven correct thanks to complex invariants with a wildly diferent untrusted counterpart purely for efciency reasons.

In this paper we use Quantitative Type Theory [16,4] as implemented in Idris 2 [5] to bridge the gap between an invariant-rich but inefcient representation based on an inductive family and an unsafe but efcient implementation using low-level primitives. Inductive families allow us to view [24,18] the runtime relevant information encoded in the low-level and efcient representation as an information-rich compile time data structure. Moreover the quantity annotations guarantee the erasure of this additional information during compilation.

# 3 Some Key Features of Idris 2

Idris 2 implements Quantitative Type Theory, a Martin-Löf type theory enriched with a semiring of quantities classifying the ways in which values may be used. In a type, each binder is annotated with the quantity by which its argument must abide.

#### 3.1 Quantities

A value may be runtime irrelevant, linear, or unrestricted.

Runtime irrelevant values (**0** quantity) cannot possibly infuence control fow as they will be erased entirely during compilation. This forces the language to impose strong restrictions on pattern-matching over these values. Typical examples are types like the a parameter in (List a), or indices like the natural number n in (Vect n a). These are guaranteed to be erased at compile time. The advantage over a Tejiščák-style analysis is that users can state their intent that an argument ought to be runtime irrelevant and the language will insist that it needs to be convinced it indeed is.

Linear values (**1** quantity) have to be used exactly once. Typical examples include the %World token used by Idris 2 to implement the IO monad à la Haskell, or fle handles that cannot be discarded without frst explicitly closing the fle. At runtime these values can be updated destructively. We will not use linearity in this paper.

Last, unrestricted values (denoted by no quantity annotation) can fow into any position, be duplicated or thrown away. They are the usual immutable values of functional programming.

The most basic of examples mobilising both the runtime irrelevance and unrestricted quantities is the identity function.

#### id **: {0** a **:** Type**} -> (**x **:** a**) ->** a id x **=** x

Its type starts with a binder using curly braces. This means it introduces an implicit variable that does not need to be flled in by the user at call sites and will be reconstructed by unifcation. The variable it introduces is named a and has type Type. It has the **0** quantity annotation which means that this argument is runtime irrelevant and so will be erased during compilation.

The second binder uses parentheses. It introduces an explicit variable whose name is x and whose type is the type a that was just bound. It has no quantity annotation which means it will be an unrestricted variable.

Finally the return type is the type a bound earlier. This is, as expected, a polymorphic function from a to a. It is implemented using a single clause that binds x on the left-hand side and immediately returns it on the right-hand side.

If we were to try to annotate the binder for x with a **0** quantity to make it runtime irrelevant then Idris 2 would rightfully reject the defnition. The following **failing** block shows part of the error message complaining that x cannot be used at an unrestricted quantity on the right-hand side.

```
failing "x is not accessible in this context."
  id : {0 a : Type} -> (0 x : a) -> a
  id x = x
```
#### 3.2 Proof Search

In Idris 2, Haskell-style ad-hoc polymorphism [25] is superseded by a more general proof search mechanism. Instead of having blessed notions of type classes, instances and constraints, the domain of any dependent function type can be marked as **auto**. This signals to the compiler that the corresponding argument will be an implicit argument and that it should not be reconstructed by unifcation alone but rather by proof search. The search algorithm will use the appropriate user-declared hints as well as the local variables in scope.

By default, a datatype's constructors are always added to the database of hints. And so the following declaration brings into scope both an indexed family So of proofs that a given boolean is True, and a unique constructor Oh that is automatically added as a hint.

#### **data** So **:** Bool **->** Type **where** Oh **:** So True

As a consequence, we can for instance defne a record type specifying what it means for n to be an even number by storing its half together with a proof that is both runtime irrelevant and flled in by proof search. Because (2 \* 3 == 6) computes to True, Idris 2 is able to fll-in the missing proof in the defnition of even6 using the Oh hint.

```
record Even (n : Nat) where
  constructor MkEven
  half : Nat
  {auto 0 prf : So (2 * half == n)}
                                           even6 : Even 6
                                           even6 = MkEven { half = 3 }
```
We will use both So and the **auto** mechanism in section 5.3.

#### 3.3 Application: **Vect**, as **List**

We can use the features of Quantitative Type Theory to give an implementation of Vect that is guaranteed to erase to a List at runtime independently of the optimisation passes implemented by the compiler. The advantage over the optimisation passes described in section 2 is that the user has control over the runtime representation and does not need to rely on these optimisations being deployed by the compiler.

The core idea is to make the slogan 'a vector is a length-indexed list' a reality by defning a record packing together the encoding as a list and a proof its length is equal to the expected Nat index. This proof is marked as runtime irrelevant to ensure that the list is the only thing remaining after compilation.

```
record Vect (n : Nat) (a : Type) where
  constructor MkVect
  encoding : List a
  0 valid : length encoding === n
```
Smart constructors Now that we have defned vectors, we can recover the usual building blocks for vectors by defning smart constructors, that is to say functions Nil and (::) that act as replacements for the inductive family's data constructors.

Nil **:** Vect Z a Nil **=** MkVect [] Refl

The smart constructor Nil returns an empty vector. It is, unsurprisingly, encoded as the empty list ([]). Because (length []) statically computes to Z, the proof that the encoding is valid can be discharged by refexivity.

```
(::) : a -> Vect n a -> Vect (S n) a
x :: MkVect xs eq = MkVect (x :: xs) (cong S eq)
```
Using (::) we can combine a head and a tail of size n to obtain a vector of size (S n). The encoding is obtained by consing the head in front of the tail's encoding and the proof this is valid (cong S eq) uses the fact that propositional equality is a congruence and that (length (x :: xs)) computes to (S (length xs)).

View Now that we know how to build vectors, we demonstrate that we can also take them apart using a view.

A view for a type T, in the sense of Wadler [24], and as refned by McBride and McKinna [18], is an inductive family V indexed by T together with a total function mapping every element t of T to a value of type (V t). This simple gadget provides a powerful, user-extensible, generalisation of pattern-matching. Patterns are defned inductively as either a pattern variable, a forced term (i.e. an arbitrary expression that is determined by a constraint arising from another pattern), or a data constructor fully applied to subpatterns. In contrast, the return indices of an inductive family's constructors can be arbitrary expressions.

In the case that interests us, the view allows us to emulate 'matching' on which of the two smart constructors Nil or (::) was used to build the vector being taken apart.

```
data View : Vect n a -> Type where
  Nil : View Nil
  (::) : (x : a) -> (xs : Vect n a) -> View (x :: xs)
```
The inductive family View is indexed by a vector and has two constructors corresponding to the two smart constructors. We use Idris 2's overloading capabilities to give each of the View's constructors the name of the smart constructor it corresponds to. By pattern-matching on a value of type (View xs), we will be able to break xs into its constitutive parts and either observe it is equal to Nil or recover its head and its tail.

```
view : (xs : Vect n a) -> View xs
view (MkVect [] Refl) = Nil
view (MkVect (x :: xs) Refl) = x :: MkVect xs Refl
```
The function view demonstrates that we can always tell which constructor was used by inspecting the encoding list. If it is empty, the vector was built using the Nil smart constructor. If it is not then we got our hands on the head and the tail of the encoding and (modulo some re-wrapping of the tail) they are efectively the head and the tail that were combined using the smart constructor.

Application: **map** We can then use these constructs to implement the function map on vectors without ever having to explicitly manipulate the encoding. The maximally sugared version of map is as follows:

```
map : (a -> b) -> Vect n a -> Vect n b
map f xs@_ with (view xs)
  _ | [] = []
  _ | hd :: tl = f hd :: map f tl
```
On the left-hand side the view lets us seamlessly pattern-match on the input vector. Using the **with** keyword we have locally modifed the function defnition so that it takes an extra argument, here the result of the intermediate computation (view xs). Correspondingly, we have two clauses matching on this extra argument; the symbol **|** separates the original left-hand side (here elided using **\_** because it is exactly the same as in the parent clause) from the additional pattern. This pattern can either have the shape [] or (hd :: tl) and, correspondingly, we learn that xs is either [] or (hd :: tl).

On the right-hand side the smart constructors let us build the output vector. Mapping a function over the empty vector yields the empty vector while mapping over a cons node yields a cons node whose head and tail have been modifed.

This sugared version of map is equivalent to the following more explicit one:

```
map : (a -> b) -> Vect n a -> Vect n b
map f xs with (view xs)
  map f .([]) | [] = []
  map f .(hd :: tl) | hd :: tl = f hd :: map f tl
```
In the parent clause we have explicitly bound xs instead of merely introducing an alias for it by writing (xs**@\_**) and so we will need to be explicit about the ways in which this pattern is refned in the two with-clauses.

In the with-clauses, we have explicitly repeated the refned version of the parent clause's left-hand side. In particular we have used dotted patterns to insist that xs is now entirely forced by the match on the result of (view xs).

We have seen that by matching on the result of the (view xs) call, we get to 'match' on xs as if Vect were an inductive type. This is the power of views.

Application: **lookup** The type (Fin n) can similarly be represented by a single natural number and a runtime irrelevant proof that it is bound by n. We leave these defnitions out, and invite the curious reader to either attempt to implement them for themselves or look at the accompanying code.

Bringing these defnitions together, we can defne a lookup function which is similar to the one defned in section 2.

```
lookup : Vect n a -> Fin n -> a
lookup xs@_ k@_ with (view xs) | (view k)
  _ | hd :: _ | Z = hd
  _ | _ :: tl | S k' = lookup tl k'
```
We are seemingly using view at two diferent types (Vect and Fin respectively) but both occurrences actually refer to separate functions: Idris 2 lets us overload functions and performs type-directed disambiguation.

For pedagogical purposes, this sugared version of lookup can also be expanded to a more explicit one that demonstrates the views' power.

```
lookup : Vect n a -> Fin n -> a
lookup xs k with (view xs) | (view k)
  lookup .(hd :: tl) .(Z) | hd :: tl | Z = hd
  lookup .(hd :: tl) .(S k') | hd :: tl | S k' = lookup tl k'
```
The main advantage of this defnition is that, based on its type alone, we know that this function is guaranteed to be processing a list and a single natural number at runtime. This efcient runtime representation does not rely on the assumption that state of the art optimisation passes will be deployed.

We have seen some of Idris 2's powerful features and how they can be leveraged to empower users to control the runtime representation of the inductive families they manipulate. This simple example only allowed us to reproduce the performance that could already be achieved by compilers deploying state of the art optimisation passes. In the following sections, we are going to see how we can use the same core ideas to compile an inductive family to a drastically diferent runtime representation while keeping good high-level ergonomics.

# 4 Thinnings, Cooked Two Ways

We experienced a major limitation of compilation of inductive families during our ongoing development of TypOS [2], a domain specifc language to defne concurrent typecheckers and elaborators. Core to this project is the defnition of actors manipulating a generic notion of syntax with binding. Internally the terms of this syntax with binding are based on a co-de Bruijn representation (an encoding we will explain below) which relies heavily on thinnings. A thinning (also known as an Order Preserving Embedding [9]) between a source and a target scope is an order preserving injection of the smaller scope into the larger one. They are usually represented using an inductive family. The omnipresence of thinnings in the co-de Bruijn representation makes their runtime representation a performance critical matter.

Let us frst remind the reader of the structure of abstract syntax trees in a named, a de Bruijn, and a co-de Bruijn representation. We will then discuss two representations of thinnings: a safe and convenient one as an inductive family, and an unsafe but efcient encoding as a pair of arbitrary precision integers.

#### 4.1 Named, de Bruijn, and co-de Bruijn Syntaxes

In this section we will use the S combinator (λg.λf.λx.gx(fx)) as a running example and represent terms using a syntax tree whose constructor nodes are circles and variable nodes are squares. To depict the S combinator we will only need λ-abstraction and application (rendered \$) nodes. A constructor's arguments become its children in the tree. The tree is laid out left-to-right and a constructor's arguments are displayed top-to-bottom.

Named Syntax The frst representation is using explicit names. Each binder has an associated name and each variable node carries a name. A variable refers to the closest enclosing binder which happens to be using the same name.

To check whether two terms are structurally equivalent (α-equivalence) potentially requires renaming bound names. In order to have a simple and cheap α-equivalence check we can instead opt for a nameless representation.

De Bruijn Syntax An abstract syntax tree based on de Bruijn indices [8] replaces names with natural numbers counting the number of binders separating a variable from its binding site. The S combinator is now written (λ λ λ 2 0 (1 0)).

You can see in the following graphical depiction that λ-abstractions do not carry a name anymore and that variables are simply pointing to the binder that introduced them. We have left the squares empty but in practice the various coloured arrows would be represented by a natural number. For instance the dashed magenta one corresponds to 1 because you need to ignore one λabstraction (the orange one) on your way towards the root of the tree before you reach the corresponding magenta binder.

To check whether a subterm does not mention a given set of variables (a thickening test, the opposite of a thinning which extends the current scope with unused variables), you need to traverse the whole term. In order to have a simple cheap thickening test we can ensure that each subterms knows precisely what its support is and how it embeds in its parent's.

Co-de Bruijn Syntax In a co-de Bruijn representation [17] each subterm selects exactly the variables that stay in scope for that term, and so a variable constructor ultimately refers to the only variable still in scope by the time it is reached. This representation ensures that we know precisely what the scope of a given term currently is.

In the following graphical rendering, we represent thinnings as lists of full (•) or empty (◦) discs depending on whether the corresponding variable is either kept or discarded. For instance the thinning represented by ◦•• throws the blue variable away, and keeps both the magenta and orange ones.

We can see that in such a representation, each node in the tree stores one thinning per subterm. This will not be tractable unless we have an efcient representation of thinnings.

#### 4.2 The Performance Challenges of co-de Bruijn

Using the co-de Bruijn approach, a term in an arbitrary context is represented by the pairing of a term in co-de Bruijn syntax with a thinning from its support into the wider scope. Having such a precise handle on each term's support allows us to make operations such as thinning, substitution, unifcation, or common sub-expression elimination more efcient.

Thinning a term does not require us to traverse it anymore. Indeed, embedding a term in a wider context will not change its support and so we can simply compose the two thinnings while keeping the term the same.

Substitution can avoid traversing subterms that will not be changed. Indeed, it can now easily detect when the substitution's domain does not intersect with the subterm's support.

Unifcation requires performing thickening tests when we want to solve a metavariable declared in a given context with a terms seemingly living in a wider one. We once more do not need to traverse the term to perform this test, and can simply check whether the outer thinning can be thickened.

Common sub-expression elimination requires us to identify alpha-equivalent terms potentially living in diferent contexts. Using a de Bruijn representation, these can be syntactically diferent: a variable represented by the natural number v in Γ would be (1+v) in Γ, σ but (2+v) in Γ, τ, ν. A co-de Bruijn representation, by discarding all the variables not in the support, guarantees that we can once more use syntactic equality to detect alpha-equivalence. This encoding is used for instance (albeit unknowingly) by Maziarz, Ellis, Lawrence, Fitzgibbon, and Peyton-Jones in their 'Hashing modulo alpha-equivalence' work [14].

For all of these reasons we have, as we mentioned earlier, opted for a co-de Bruijn representation in the implementation of TypOS [2]. And so it is crucial for performance that we have a compact representation of thinnings.

Thinnings in TypOS We frst carefully worked out the trickier parts of the implementation in Agda before porting the resulting code to Haskell. This process highlighted a glaring gap between on the one hand the experiments done using a strongly typed inductive representation of thinnings and on the other hand their more efcient but unsafe encoding in Haskell.

Agda The Agda-based experiments use inductive families that make the key invariants explicit which helps tracking complex constraints and catches design faws at typechecking time. The indices guarantee that we always transform the thinnings appropriately when we add or remove bound variables. In Idris 2, the inductive family representation of thinnings would be written:

```
data Thinning : (sx, sy : SnocList a) -> Type where
  Done : Thinning [<] [<]
  Keep : Thinning sx sy -> (0 x : a) -> Thinning (sx :< x) (sy :< x)
  Drop : Thinning sx sy -> (0 x : a) -> Thinning sx (sy :< x)
```
The Thinning family is indexed by two scopes (represented as snoclists i.e. lists that are extended from the right, just like contexts in inference rules): sx the tighter scope and sy the wider one. The Done constructor corresponds to a thinning from the empty scope to itself ([<] is Idris 2 syntactic sugar for the empty snoclist), and Keep and Drop respectively extend a given thinning by keeping or dropping the most local variable (:< is the 'snoc' constructor, a sort of fipped 'cons'). The 'name' (x of type a) is marked with the quantity **0** to ensure it is erased at compile time (cf. section 3).

During compilation, Idris 2 would erase the families' indices as they are forced (in the sense of Brady, McBride, and McKinna [6]), and drop the constructor arguments marked as runtime irrelevant. The resulting inductive type would be the following simple data type.

#### **data** Thinning **=** Done **|** Keep Thinning **|** Drop Thinning

At runtime this representation is therefore essentially a linked list of booleans (Done being Nil, and Keep and Drop respectively (True ::) and (False ::)).

Haskell The Haskell implementation uses this observation and picks a packed encoding of this list of booleans as a pair of integers. One integer represents the length n of the list, and the other integer's n least signifcant bits encode the list as a bit pattern where 1 is Keep and 0 is Drop.

Basic operations on thinnings are implemented by explicitly manipulating individual bits. It is not indexed and thus all the invariant tracking has to be done by hand. This has led to numerous and hard to diagnose bugs.

Thinnings in Idris 2 Idris 2 is a self-hosting language whose core datatype is currently based on a well-scoped de Bruijn representation. This precise indexing of terms by their scope helped entirely eliminate a whole class of bugs that plagued Idris 1's unifcation machinery.

If we were to switch to a co-de Bruijn representation for our core language we would want, and should be able, to have the best of both worlds: a safe and efcient representation!

Thankfully Idris 2 implements Quantitative Type Theory (QTT) which gives us a lot of control over what is to be runtime relevant and what is to be erased during compilation. This should allow us to insist on having a high-level interface that resembles an inductive family while ensuring that everything but a pair of integers is erased at compile time. We will exploit the key features of QTT presented in section 3 to have our cake and eat it.

# 5 An Efcient Invariant-Rich Representation

We can combine both approaches highlighted in section 4.2 by defning a record parameterised by a source (sx) and target (sy) scopes corresponding to the two ends of the thinnings, just like we would for the inductive family. This record packs two numbers and a runtime irrelevant proof.

Firstly, we have a natural number called bigEnd corresponding to the size of the big end of the thinning (sy). We are happy to use a (unary) natural number here because we know that Idris 2 will compile it to an unbounded integer.

Secondly, we have an integer called encoding corresponding to the thinning represented as a bit vector stating, for each variable, whether it is kept

or dropped. We only care about the integer's bigEnd least signifcant bits and assume the rest is set to 0.

Thirdly, we have a runtime irrelevant proof invariant that encoding is indeed a valid encoding of size bigEnd of a thinning from sx to sy. We will explore the defnition of the relation Invariant later on in section 5.3.

```
record Th {a : Type} (sx, sy : SnocList a) where
  constructor MkTh
  bigEnd : Nat
  encoding : Integer
  0 invariant : Invariant bigEnd encoding sx sy
```
The frst sign that this defnition is adequate is our ability to construct any valid thinning. We demonstrate it is the case by introducing functions that act as smart constructor analogues for the inductive family's data constructors.

#### 5.1 Smart Constructors for **Th**

The frst and simplest one is done, a function that packs a pair of 0 (the size of the big end, and the empty encoding) together with a proof that it is an adequate encoding of the thinning from the empty scope to itself. In this instance, the proof is simply the Done constructor.

done **:** Th [<] [<] done **=** MkTh **{** bigEnd **=** 0**,** encoding **=** 0**,** invariant **=** Done **}**

To implement both keep and drop, we are going to need to perform bit-level manipulations. These are made easy by Idris 2's Bits interface which provides us with functions to shift the bit patterns left or right (shiftl, shiftr), set or clear bits at specifed positions (setBit, clearBit), take bitwise logical operations like disjunction (.|.) or conjunction (.&.), etc.

In both keep and drop, we need to extend the encoding with an additional bit. For this purpose we introduce the cons function which takes a bit b and an existing encoding bs and returns the new encoding bs·b.

```
cons : Bool -> Integer -> Integer
cons b bs = let bs0 = bs 'shiftL' 1 in
            if b then (bs0 'setBit' 0) else bs0
```
No matter what the value of the new bit is, we start by shifting the encoding to the left to make space for it; this gives us bs0 which contains the bit pattern bs·0. If the bit is True then we need to additionally set the bit at position 0 to obtain bs · 1. Otherwise if the bit is False, we can readily return the bs · 0 encoding obtained by left shifting. The correctness of this function is backed by two lemma: testing the bit at index 0 after consing amounts to returning the cons'd bit, and shifting the cons'd encoding to the right takes us back to the unextended encoding.

```
testBit0Cons : (b : Bool) -> (bs : Integer) ->
               testBit (cons b bs) 0 === b
consShiftR : (b : Bool) -> (bs : Integer) ->
             (cons b bs) 'shiftR' 1 === bs
```
The keep smart constructor demonstrates that from a thinning from sx to sy and a runtime irrelevant variable x we can compute a thinning from the extended source scope (sx :< x) to the target scope (sy :< x) where x was kept.

```
keep : Th sx sy -> (0 x : a) -> Th (sx :< x) (sy :< x)
keep th x = MkTh
 { bigEnd = S (th .bigEnd)
 , encoding = cons True (th .encoding)
 , invariant =
    let 0 b = eqToSo $ testBit0Cons True (th .encoding) in
    Keep (rewrite consShiftR True (th .encoding) in th.invariant) x
 }
```
The outer scope has grown by one variable and so we increment bigEnd. The encoding is obtained by cons-ing the boolean True to record the fact that this new variable is kept. Finally, we use the two lemmas shown above to convince Idris 2 the invariant has been maintained.

Similarly the drop function demonstrates that we can compute a thinning getting rid of the variable x freshly added to the target scope.

```
drop : Th sx sy -> (0 x : a) -> Th sx (sy :< x)
drop th x = MkTh
 { bigEnd = S (th .bigEnd)
 , encoding = cons False (th .encoding)
 , invariant =
   let 0 prf = testBit0Cons False (th .encoding)
       0 nb = eqToSo $ cong not prf in
   Drop (rewrite consShiftR False (th .encoding) in th .invariant) x
 }
```
We once again increment the bigEnd, use cons to record that the variable is being discarded and use the lemmas ensuring its correctness to convince Idris 2 the invariant is maintained.

We can already deploy these smart constructors to implement functions producing thinnings. We use which as our example. It is a flter-like function that returns a dependent pair containing the elements that satisfy a boolean predicate together with a proof that there is a thinning embedding them back into the input snoclist.

```
G. Allais
which : (a -> Bool) -> (sy : SnocList a) ->
        (sx : SnocList a ** Th sx sy)
which p [<] = ([<] ** done)
which p (sy :< y) =
  let (sx ** th) = which p sy in
  if p y then (sx :< y ** keep th y)
         else (sx ** drop th y)
128
```
If the input snoclist is empty then the output shall also be, and done builds a thinning from [<] to itself. If it is not empty we can perform a recursive call on the tail of the snoclist and then depending on whether the predicates holds true of the head we can either keep or drop it.

We are now equipped with these smart constructors that allow us to seamlessly build thinnings. To recover the full expressive power of the inductive family, we also need to be able to take these thinnings apart. Let us now tackle this issue.

#### 5.2 Pattern Matching on **Th**

The View family is a sum type indexed by a thinning. It has one data constructor associated to each smart constructor and storing its arguments.

```
data View : Th sx sy -> Type where
  Done : View done
  Keep : (th : Th sx sy) -> (0 x : a) -> View (keep th x)
  Drop : (th : Th sx sy) -> (0 x : a) -> View (drop th x)
```
The accompanying view function witnesses the fact that any thinning arises as one of these three cases.

view **: (**th **:** Th sx sy**) ->** View th

We show the implementation of view in its entirety but leave out the technical auxiliary lemma it invokes. The interested reader can fnd them in the accompanying material. We will however inspect the code view compiles to after erasure in section 5.5 to confrm that these auxiliary defnitions do not incur any additional runtime cost.

We frst start by pattern matching on the bigEnd of the thinning. If it is 0 then we know the thinning has to be the empty thinning. Thanks to an inversion lemma called isDone, we can collect a lot of equality proofs: the encoding bs has to be 0, the source and target scopes sx and sy have to be the empty snoclists, and the proof prf of the invariant has to be of a specifc shape. Rewriting by these equalities changes the goal type enough for the typechecker to ultimately see that the thinning was constructed using the done smart constructor and so we can use the view's Done constructor.

```
view (MkTh 0 bs prf) =
  let 0 eqs = isDone prf in
  rewrite bsIsZero eqs in
  rewrite fstIndexIsLin eqs in
  rewrite sndIndexIsLin eqs in
  rewrite invariantIsDone eqs in
  Done
```
In case the thinning is non-empty, we need to inspect the 0-th bit of the encoding to know whether it keeps or discards its most local variable. This is done by calling the choose function which takes a boolean b and returns a value of type (Either (So b) (So (not b)) i.e. we not only inspect the boolean but also record which value we got in a proof using the So family introduced in section 3.

```
view (MkTh (S i) bs prf) = case choose (testBit bs Z) of
```
If the bit is set then we know the variable is kept. And so we can invoke an inversion lemma that will once again provide us with a lot of equalities that we immediately deploy to reshape the goal's type. This ultimately lets us assemble a sub-thinning and use the view's Keep constructor.

```
Left so =>
  let 0 eqs = isKeep prf so in
  rewrite fstIndexIsSnoc eqs in
  rewrite sndIndexIsSnoc eqs in
  rewrite invariantIsKeep eqs in
  rewrite isKeepInteger bs so in
  let th : Th eqs.fstIndexTail eqs.sndIndexTail
      th = MkTh i (bs 'shiftR' 1) eqs.subInvariant in
  cast $ Keep th eqs.keptHead
```
If the bit is not set then we learn that the thinning was constructed using drop. We can once again use an inversion lemma to rearrange the goal and fnally invoke the view's Drop constructor.

```
Right soNot =>
  let 0 eqs = isDrop prf soNot in
  rewrite sndIndexIsSnoc eqs in
  rewrite invariantIsDrop eqs in
  rewrite isDropInteger bs soNot in
  let th : Th sx eqs.sndIndexTail
      th = MkTh i (bs 'shiftR' 1) eqs.subInvariant in
  cast $ Drop th eqs.keptHead
```
We can readily use this function to implement pattern matching functions taking a thinning apart. We can for instance defne kept, the function that counts the number of keep smart constructors used when manufacturing the input thinning and returns a proof that this is exactly the length of the source scope sx.

```
kept : Th sx sy -> (n : Nat ** length sx === n)
kept th = case view th of
  Done => (0 ** Refl)
  Keep th x => let (n ** eq) = kept th in
               (S n ** cong S eq)
  Drop th x => kept th
```
We proceed by calling the view function on the input thinning which immediately tells us that we only have three cases to consider. The Done case is easily handled because the branch's refned types inform us that both sx and sy are the empty snoclist [<] whose length is evidently 0. In the Keep branch we learn that sx has the shape (\_ :< x) and so we must return the successor of whatever the result of the recursive call gives us. Finally in the Drop case, sx is untouched and so a simple recursive call sufces. Note that the function is correctly detected as total because the target scope sy is indeed getting structurally smaller at every single recursive call. It is runtime irrelevant but it can still be successfully used as a termination measure by the compiler.

# 5.3 The **Invariant** Relation

We have shown the user-facing Th and have claimed that it is possible to defne smart constructors done, keep, and drop, as well as a view function. This should become apparent once we show the actual defnition of Invariant.

Defnition of **Invariant** The relation maintains the invariant between the record's felds bigEnd (a Nat) and encoding (an Integer) and the index scopes sx and sy. Its defnition can favour ease-of-use of runtime efciency because we statically know that all of the Invariant proofs will be erased during compilation.

```
data Invariant : (i : Nat) -> (bs : Integer) ->
                 (sx, sy : SnocList a) -> Type where
  Done : Invariant Z 0 [<] [<]
  Keep : Invariant i (bs 'shiftR' 1) sx sy -> (0 x : a) ->
         {auto 0 b : So (testBit bs Z)} ->
         Invariant (S i) bs (sx :< x) (sy :< x)
  Drop : Invariant i (bs 'shiftR' 1) sx sy -> (0 x : a) ->
         {auto 0 nb : So (not (testBit bs Z))} ->
         Invariant (S i) bs sx (sy :< x)
```
As always, the Done constructor is the simplest. It states that the thinning of size Z and encoded as the bit pattern 0 is the empty thinning.

The Keep constructor guarantees that the thinning of size (S i) and encoding bs represents an injection from (sx :< x) to (sy :< x) provided that the bit at position Z of bs is set, and that the rest of the bit pattern (obtained by a right shift on bs) is a valid thinning of size i from sx to sy.

The Drop constructor is structured the same way, except that it insists the bit at position Z should not be set.

We can readily use this relation to prove that some basic encoding are valid representations of useful thinnings.

Examples of **Invariant** proofs For instance, we can always defne a thinning from the empty scope to an arbitrary scope sy.

```
none : (sy : SnocList a) -> Th [<] sy
none sy = MkTh (length sy) 0 (none sy)
```
The encoding of this thinning is 0 because every variable is being discarded and its bigEnd is the length of the outer scope sy. The validity proof is provided by the none lemma proven below. We once again use Idris 2's overloading to give the same to functions that play similar roles but at diferent types.

```
none : (sy : SnocList a) -> Invariant (length sy) 0 [<] sy
none [<] = Done
none (sy :< y) = Drop (none sy) y
```
The proof proceeds by induction over the outer scope sy. If it is empty, we can simply use the constructor for the empty thinning. Otherwise we can invoke Drop on the induction hypothesis. This all typechecks because (testBit 0 Z) computes to False and so the nb proof can be constructed automatically by Idris 2's proof search (cf. section 3.2), and (0 'shiftR' 1) evaluates to 0 which means the induction hypothesis has exactly the right type.

The defnition of the identity thinning is a bit more involved. For a scope of size n, we are going to need to generate a bit pattern consisting of n ones. We defne it in two steps. First, cofull defnes a bit pattern of k zeros followed by infnitely many ones by shifting k places to the left a bit pattern of ones only. Then, we obtain full by taking the complement of cofull.

```
cofull : Nat -> Integer
cofull n = oneBits 'shiftL' n
                                       full : Nat -> Integer
                                       full n = complement (cofull n)
```
We can then defne the identity thinning for a scope of size n by pairing (full n) as the encoding and n as the bigEnd.

ones **: (**sx **:** SnocList a**) ->** Th sx sx ones sx **= let** n **:** Nat; n **=** length sx **in** MkTh n **(**full n**) (**ones sx**)**

The bulk of the work is once again in the eponymous lemma proving that this encoding is valid.

```
ones : (sx : SnocList a) ->
       let n = length sx in Invariant n (full n) sx sx
ones [<] = Done
ones (sx :< x) =
  let 0 nb = eqToSo (testBitFull (S (length sx)) Z) in
  Keep (rewrite shiftRFull (length sx) in ones sx) x
```
This proof proceeds once more by induction on the scope. If the scope is empty then once again the constructor for the empty thinning will do. In the nonempty case, we frst appeal to an auxiliary lemma (not shown here) to construct a proof nb that the bit at position Z for a non-zero full integer is known to be True. We then need to use another lemma to cast the induction hypothesis which mentions (full (length sx)) so that it may be used in a position where we expect a proof talking about (full (length (sx :< x)) 'shiftR' 1).

Properties of the **Invariant** relation This relation has a lot of convenient properties.

First, it is proof irrelevant: any two proofs that the same i, bs, sx, and sy are related are provably equal. Consequently, equality on Th values amounts to equality of the bigEnd and encoding values. In particular it is cheap to test whether a given thinning is the empty or the identity thinning.

Second, it can be inverted [12] knowing only two bits: whether the natural number is empty and what the value of the bit at position Z of the encoding is. This is what allowed us to efciently implement the view function by using these two checks and then inverting the Invariant proof to gain access to the proof that the remainder of the thinning's encoding is valid. We will see in section 5.5 that this leads to efcient runtime code for the view.

#### 5.4 Choose Your Own Abstraction Level

Access to both the high-level View and the internal Invariant relation means that programmers can pick the level of abstraction at which they want to work. They may need to explicitly manipulate bits to implement key operators that are used in performance-critical paths but can also stay at the highest level for more negligible operations, or when proving runtime irrelevant properties.

In the previous section we saw simple examples of these bit manipulations when defning none (using the constant 0 bit pattern) and ones using bit shifting and complement to form an initial segment of 1s followed by 0s.

Other natural examples include the meet and join of two thinnings sharing the same wider scope. The join can for instance be thought of either as a function defned by induction on the frst thinning and case analysis on the second, emitting a Keep constructor whenever either of the inputs does. Or we can observe that the bit pattern in the join is the disjunction of the inputs' bit patterns and prove a lemma about the Invariant relation instead. This can be visualised as follows: in each column the meet is a • whenever either of the inputs is.

The join is of particular importance because it appears when we convert an 'opened' view of a term into its co-de Bruijn counterpart. As we mentioned earlier, co-de Bruijn terms in an arbitrary scope are represented by the pairing of a term indexed by its precise support with a thinning embedding this support back into the wider scope. When working with such a representation, it is convenient to have access to an 'opened' view where the outer thinning has been pushed inside therefore exposing the term's top-level constructor, ready for case-analysis.

The following diagram shows the correspondence between an 'opened' application node using the view (the diamond '\$' node) with two subterms both living in the outer scope and its co-de Bruijn form (the circular '\$' node) with an outer thinning selecting the term support.

The outer thinning of the co-de Bruijn term is obtained precisely by computing the join of the respective outer thinnings of the 'opened' application's function and argument.

These explicit bit manipulations will be preserved during compilation and thus deliver more efcient code.

#### 5.5 Compiled Code

The following code block shows the JavaScript code that is produced when compiling the view function. We chose to use the JavaScript backend rather than e.g. the ChezScheme one because it produces fairly readable code. We have modifed the backend to also write comments reminding the reader of the type of the function being defned and the data constructors the natural number tags correspond to. These changes are now available to all in Idris 2 version 0.6.0.

The only manual modifcations we have performed are the inlining of a function corresponding to a **case** block, renaming variables and property names to make them human-readable, introducing the \$tail defnitions to make lines shorter, and slightly changing the layout.

```
/* Thin.Smart.view : (th : Th sx sy) -> View th */
function Thin_Smart_view($th) {
  switch($th.bigEnd) {
```

```
case 0n: return {h: 0 /* Done */};
default: {
  const $predBE = ($th.bigEnd-1n);
  const $test = choose(notEq(($th.encoding&1n), 0n)));
  switch($test.tag) {
    case 0: /* Left */ {
      const $tail = $th.encoding>>1n;
      return { tag: 1 /* Keep */
             , val: {bigEnd: $predBE, encoding: $tail}}; }
    case 1: /* Right */ {
      const $tail = $th.encoding>>1n;
      return { tag: 2 /* Drop */
             , val: {bigEnd: $predBE, encoding: $tail}}; }
```
}}}}

Readers can see that the compilation process has erased all of the indices and the proofs showing that the invariant tying the efcient runtime representation to the high-level specifcation is maintained. A thinning is represented at runtime by a JavaScript object with two properties corresponding to Th's runtime relevant felds: bigEnd and encoding. Both are storing a JavaScript bigInt (one corresponding to the Nat, the other to the Integer). For instance the thinning [01101] would be at runtime { bigEnd: 5n, encoding: 13n }.

The view proceeds in two steps. First if the bigEnd is 0n then we know the thinning is empty and can immediately return the Done constructor. Otherwise we know the thinning to be non-empty and so we can compute the big end of its tail (\$predBE) by subtracting one to the non-zero bigEnd. We can then inspect the bit at position 0 to decide whether to return a Keep or a Drop constructor. This is performed by using a bit mask to 0-out all the other bits (\$th.bigEnd&1n) and checking whether the result is zero. If it is not equal to 0 then we emit Keep and compute the \$tail of the thinning by shifting the original encoding to drop the 0th bit. Otherwise we emit Drop and compute the same tail.

By running view on this [01101] thinning, we would get back (Keep [0110]), that is to say { tag: 1, val: { bigEnd: 4n, encoding: 6n } }.

Thanks to Idris 2's implementation of Quantitative Type Theory we have managed to manufacture a high level representation that can be manipulated like a classic inductive family using smart constructors and views without giving up an inch of control on its runtime representation.

The remaining issues such as the fact that we form the view's constructors only to immediately take them apart thus creating needless allocations can be tackled by reusing Wadler's analysis (section 12 of [24]).

# 6 Conclusion

We have seen that inductive families provide programmers with ways to root out bugs by enforcing strong invariants. Unfortunately these families can get in the way of producing performant code despite existing optimisation passes erasing redundant or runtime irrelevant data. This tension has led us to take advantage of Quantitative Type Theory in order to design a library combining the best of both worlds: the strong invariants and ease of use of inductive families together with the runtime performance of explicit bit manipulations.

#### 6.1 Related Work

For historical and ergonomic reasons, idiomatic code in Coq tends to center programs written in a subset of the language quite close to OCaml and then prove properties about these programs in the runtime irrelevant Prop fragment. This can lead to awkward encodings when the unrefned inputs force the user to consider cases which ought to be impossible. Common coping strategies involve relaxing the types to insert a modicum of partiality e.g. returning an option type or taking an additional input to be used as the default return value. This approach completely misses the point of type-driven development. We beneft from having as much information as possible available during interactive editing. This information not only helps tremendously getting the defnitions right by ensuring we always maintain vital invariants thus making invalid states unrepresentable, it also gives programmers access to type-driven tools and automation. Thankfully libraries such as Equations [20,21] can help users write more dependently typed programs, by taking care of the complex encoding required in Coq. A view-based approach similar to ours but using Prop instead of the zero quantity ought to be possible. We expect that the views encoded this way in Coq will have an even worse computational behaviour given that Equations uses a sophisticated elaboration process to encode dependent pattern-matching into Gallina. However Coq does beneft from good automation support for unfolding lemmas, inversion principles, and rewriting by equalities. It may compensate for the awkwardness introduced by the encoding.

Prior work on erasure [22] has the advantage of ofering a fully automated analysis of the code. The main inconvenience is that users cannot state explicitly that a piece of data ought to be runtime irrelevant and so they may end up inadvertently using it which would prevent its erasure. Quantitative Type Theory allows us users to explicitly choose what is and is not runtime relevant, with the quantity checker keeping us true to our word. This should ensure that the resulting program has a much more predictable complexity.

A somewhat related idea was explored by Brady, McKinna, and Hammond in the context of circuit design [7]. In their verifcation work they index an efcient representation (natural numbers as a list of bits) by its meaning as a unary natural number. All the operations are correct by construction as witnessed by the use of their unary counterparts acting as type-level specifcations. In the end their algorithms still process the inductive family instead of working directly with binary numbers. This makes sense in their setting where they construct circuits and so are explicitly manipulating wires carrying bits. By contrast, in our motivating example we really want to get down to actual (unbounded) integers rather than linked lists of bits.

#### 6.2 Limitations and Future Work

Overall we found this case study using Idris 2, a state of the art language based on Quantitative Type Theory, very encouraging. The language implementation is still experimental but none of the issues are intrinsic limitations. We hope to be able to push this line of work further, tackling the following limitations and exploring more advanced use cases.

Limitations Unfortunately it is only propositionally true that (view (keep th x)) computes to (Keep th x) (and similarly for done/Done and drop/Drop). This means that users may need to manually deploy these lemmas when proving the properties of functions defned by pattern matching on the result of calling the view function. This annoyance would disappear if we had the ability to extend Idris 2's reduction rules with user-proven equations as implemented in Agda and formally studied by Cockx, Tabareau, and Winterhalter [10].

In this paper's case study, we were able to design the core Invariant relation making the invariants explicit in such a way that it would be provably proof irrelevant. This may not always be possible given the type theory currently implemented by Idris 2. Adding support for a proof-irrelevant sort of propositions (see e.g. Altenkirch, McBride, and Swierstra's work [3]) could solve this issue once and for all.

The Idris 2 standard library thankfully gave us access to a polished pure interface to explicitly manipulate an integer's bits. However these built-in operations came with no built-in properties whatsoever. And so we had to postulate a (minimal) set of axioms and prove a lot of useful corollaries ourselves. There is even less support for other low-level operations such as reading from a read-only array, or manipulating pointers.

We also found the use of runtime irrelevance (the **0** quantity) sometimes frustrating. Pattern-matching on a runtime irrelevant value in a runtime relevant context is currently only possible if it is manifest for the compiler that the value could only arise using one of the family's constructors. In non-trivial cases this is unfortunately only merely provable rather than self-evident. Consequently we are forced to jump through hoops to appease the quantity checker, and end up defning complex inversion lemmas to bypass these limitations. This could be solved by a mix of improvements to the typechecker and meta-programming using prior ideas on automating inversion [12,15,19].

Future work We are planning to explore more memory-mapped representations equipped with a high level interface.

We already have experimental results demonstrating that we can use a readonly array as a runtime representation of a binary search tree. Search can be implemented as a proven-correct high level decision procedure that is seemingly recursively exploring the "tree". At runtime however, this will efectively execute like a classic search by dichotomy over the array.

More generally, we expect that a lot of the work on programming on serialised data done in LoCal [23] thanks to specifc support from the compiler can be done as-is in a QTT-based programming language. Indeed, QTT's type system is powerful enough that tracking these invariants can be done purely in library code.

In the short term, we would like to design a small embedded domain specifc language giving users the ability to more easily build and take apart products and sums efciently represented in the style we presented here. Staging would help here to ensure that the use of the eDSL comes at no runtime cost. There are plans to add type-enforced staging to Idris 2, thus really making it the ideal host language for our project.

Our long term plan is to go beyond read-only data and look at imperative programs proven correct using separation logic and see how much of this afterthe-facts reasoning can be brought back into the types to enable a high-level correct-by-construction programming style that behaves the same at runtime.

Acknowledgements We are grateful to Conor McBride for discussions pertaining to the fne details of the unsafe encoding used in TypOS, as well as James McKinna, Fredrik Nordvall Forsberg, Ohad Kammar, and Jacques Carette for providing helpful comments and suggestions on early versions of this paper.

This research was funded by the Engineering and Physical Sciences Research Council (grant number EP/T007265/1).

The research data underpinning this publication [1] can be accessed at https: //doi.org/10.17630/bd1085ce-a462-4a8b-ae81-9ededb4aea21.

# References


2003, Revised Selected Papers. Lecture Notes in Computer Science, vol. 3085, pp. 115–129. Springer (2003). https://doi.org/10.1007/978-3-540-24849-1\_8


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Pragmatic Gradual Polymorphism with References

Wenjia Ye1() and Bruno C. d. S. Oliveira<sup>1</sup>

The University of Hong Kong, Hong Kong SAR, China {wjye,bruno}@cs.hku.hk

Abstract. Gradualizing System F has been widely discussed. A big challenge is to preserve relational parametricity and/or the gradual guarantee. Most past work has focused on the preservation of parametricity, but often without the gradual guarantee. A few recent works satisfy both properties by giving up System F syntax, or with some restrictions and the introduction of sophisticated mechanisms in the dynamic semantics.

While parametricity is important for polymorphic languages, most mainstream languages typically do not satisfy it, for a variety of different reasons. In this paper, we explore the design space of polymorphic languages that satisfy the gradual guarantee, but do not preserve parametricity. When parametricity is not a goal, the design of polymorphic gradual languages can be considerably simplified. Moreover, it becomes easy to add features that are of practical importance, such as mutable references. We present a new gradually typed polymorphic calculus, called λ *G gpr*, with mutable references and with an easy proof of the gradual guarantee. In addition, compared to other gradual polymorphism work, λ *G gpr* is defined using a Type-Directed Operational Semantics (TDOS), which allows the dynamic semantics to be defined directly instead of elaborating to a target cast language. λ *G gpr* and all the proofs in this paper are formalized in Coq.

Keywords: Gradual Typing · Type System · Polymorphism.

# 1 Introduction

Statically typed languages can statically detect potential errors in programs, but must necessarily be conservative and reject some well-behaved programs. With dynamically typed languages, all programs are accepted, which offers a great amount of flexibility. However, the accepted dynamic programs include programs with type errors, making it harder to detect programs that are ill-behaved because of type errors. Considering the weaknesses and advantages of static and dynamic type systems, many approaches have proposed to integrate these two spectrums [1,7,35,22,8]. *Gradual typing* [31,35] provides a smooth integration of the two styles and has been under active research in the programming languages community. In addition to the type soundness property, a gradual language should behave as a static language if it is fully annotated. Conversely, it should behave as a dynamic language for fully dynamic programs. Importantly, the *gradual guarantee* [32] has been proposed to ensure a smooth transition between static and dynamic typing.

The importance of System F as a foundation for programming languages with polymorphism naturally leads to the question of whether it is possible to gradualize it. Various researchers have explored this question. In this line of research, a long-standing goal has been how to preserve *relational parametricity* [28]. Parametricity ensures a uniform behavior for all instantiations of polymorphic functions, and is an important property of System F. In addition it is also desirable to preserve the *gradual guarantee* [32], which is recognized as an important property for gradual languages. Unlike System F, where no dynamic mechanism is needed to ensure parametricity, with gradualized versions of System F this is no longer the case. Ahmed *et al.* [3] showed that parametricity can be enforced using a *dynamic sealing* mechanism at runtime. They prove parametricity, but the gradual guarantee is not discussed. Igarashi *et al.* [17] improved on the dynamic sealing approach and proposed a more efficient mechanism. While the gradual guarantee has been discussed, it was left as a conjecture. Toro *et al.* [37] even proved that gradual guarantee and parametricity are incompatible. By giving up the traditional System F syntax, New *et al.* [24] proved the gradual guarantee and parametricity by using user-provided sealing annotations, but this requires resorting to syntax that is not based on System F. Finally, Labrada *et al.* [20] proved the gradual guarantee and parametricity by inserting sealing with some restrictions. For instance, only base and variable types can be used to instantiate type applications.

While parametricity is highly valued and it is guaranteed in practice in some functional languages, many mainstream programming languages – such as Java, TypeScript or Flow – do not have parametricity. In mainstream languages the value of parametric polymorphism, and its ability to express a whole family of functions in a reusable and type-safe manner is certainly recognized. However, such languages are imperative and come with a variety of programming language features (such as unrestricted forms of mutable state, exceptions, parallelism and concurrency mechanisms, reflection, etc.) that make it hard to apply reasoning principles known in functional programming. In particular, most of those features are known to be highly challenging to deal with in the presence of parametricity [2,18,23]. This makes it non-obvious how to design a language with all those features, while preserving parametricity, in the first place. Moreover, preserving parametricity may require extra dynamic checks at runtime, which for implementations where performance is a critical factor may discourage implementers from doing such checks. Therefore all the aforementioned programming languages support System F like mechanisms to deal with polymorphism and benefit from the reuse afforded by polymorphism. However, the reasoning principles that arise from polymorphism, such as parametricity is discarded, and parametricity is not enforced.

In particular, programming languages such as TypeScript or Flow, which support some form of gradual/optional typing, and are widely used in practice, do not support parametricity. Figure 1 encodes an example from Ahmed *et al.*'s work [3], which was used to illustrate the parametricity challenge in gradual typing, in TypeScript and Flow. In this program, the polymorphic function Ks has a polymorphic type: (*X* → *Y* → *Y*), where *X* and *Y* are type variables. In a calculus with parametricity, we know that a function with such type should always return the *second* argument or, in the presence of runtime casts, return an error. In the program, Ks is as a function that casts a dynamic constant function (K) that returns the *first* argument, which violates parametricity. When the TypeScript and Flow programs are run the first argument 2 is returned, illustrating that both languages do not enforce parametricity. In a gradual language with parametricity the result that we would expect is an error. Furthermore, even if we turn to Typed

```
function K(x:any, y:any): any {
    return x;
}
function Ks<X, Y>(x: X, y: Y): Y {
    let CAST = (K as any) as ((x:
  X, y: Y) ⇒ Y);
    return CAST(x, y);
}
function run() {
    console.log(Ks<number,
  number>(2,3));
}
         (a) TypeScript code.
                                        function K(x:any, y:any): any {
                                               return x;
                                        }
                                        function Ks<X, Y>(x: X, y: Y): Y {
                                             let CAST = ((K : any) : ((x:
                                           X, y: Y) ⇒ Y));
                                             return CAST(x, y);
                                        }
                                        function run() {
                                             console.log(Ks (2,3));
                                        }
                                                    (b) Flow code.
```
Fig. 1: Ahmed *et al.* [3] program for illustrating parametricity in TypeScript and Flow.

Racket [36], which is a well-established gradual language used in both gradual typing research and in practice, the result is similar and 2 is returned:

```
(: K Any)
(define K ( λ (x) ( λ (y) x)))
(define Ks
  (cast K (All (X Y) (→ X (→ Y Y)))))
((Ks 2) 3)
```
Therefore Typed Racket does not enforce parametricity either.

In this paper, we explore the more pragmatic design space of polymorphic gradual languages with the gradual guarantee, but no parametricity. We believe that such designs are relevant because many practical language designs do not support parametricity, but support various other programming features instead. Dropping the requirement for parametricity enables us to explore language designs with many relevant practical features, while being in line with current designs for existing practical gradually typed languages. In particular, this paper studies the combination of parametric polymorphism, gradual typing and references. We show that, when parametricity is not a goal, the design of gradually polymorphic languages can be simplified, making it easier to add features such as references. Moreover, the gradual guarantee, which has shown to be quite problematic in all existing calculi with gradual polymorphism, is simple to achieve. We present a standard static calculus with polymorphism and mutable references called λ*gpr*. Then we introduce the gradual counterpart, called λ *G gpr*.

The approach that we follow to give the dynamic semantics to λ *G gpr* is to use the recently proposed Type-Directed Operational Semantics TDOS [16,42]. In contrast, traditionally the semantics of a gradually typed language is defined by elaboration to a target *cast calculus* such as the *blame calculus* [39]. In other words, the dynamic semantics of the gradual source language is given *indirectly* by translating to the target language. As Ye *et al.* [42] shows, TDOS avoids such indirection and uses bidirectional typing and type annotations to enforce both *implicit* and *explicit* casting at runtime in gradually typed languages.

In summary, we make the following contributions in this paper:


https://www.zenodo.org/badge/latestdoi/581421930

# 2 Overview

This section provides a background for gradual polymorphic calculi, calculi with gradual references and the key ideas of our static system (λ*gpr*) with polymorphism and references and its gradual counterpart (λ *G gpr*).

# 2.1 Background

*Gradual References.* Mutable references read or write content into a memory cell. A common set of operations is: allocating a memory cell (ref *e*); updating references (*e*<sup>1</sup> := *e*2) and reading the content from a reference (!*e*). Locations (*o*) point to the memory cell. For a reference value ref 1, a new location (*o*) is generated and value 1 is stored in the cell at the location *o*. If 2 is assigned to this location *o* := 2, the cell value is updated to 2. Later, when we read this cell (!*o*), 2 is returned. Siek *et al.* [31] defined an *invariant* consistency relation for reference types. Reference types are only consistent with themselves. For example:

$$(\lambda \mathbf{x}. (\mathbf{x} := \mathbf{2}) : \mathbf{Ref} \ \star \rightarrow \mathbf{Ref} \ \star) (\text{ref l}) \qquad \text{---} \text{Rejected! } \ \mathbf{Ref} \ \star \ \mathbf{Ref} \ \star)$$

Although the type Int is consistent with ?, it does not mean that Ref Int is consistent with Ref ?. Therefore, the argument type is not consistent with the function input, and the program is rejected. Herman *et al.* [14] proposed a gradually typed lambda source language with references, which defines the dynamic semantics by elaborating to a coercion calculus. The above program is allowed in their calculus. They define *variant* consistency where if *A* is consistent with *B* then Ref *A* is consistent with Ref *B*. In their calculus, casts are combined to achieve space-efficiency. Furthermore, Siek *et al.* [33] explored monotonic references with variant consistency. Their main consideration is space efficiency. No runtime overhead is imposed in the statically typed part of programs. All the above works have not considered the gradual guarantee.

Toro and Tanter [38] showed how to employ the Abstracting Gradual Typing (AGT) [12] methodology to design a gradually typed calculus with mutable references (λ*REF* <sup>g</sup> ). Their dynamic semantics of the source language is defined by translating to an evidence base calculus. They prove a bisimulation with the coercion calculus by Herman et al. [14]. <sup>λ</sup>*REF* <sup>g</sup> is proved to satisfy the gradual guarantee. The consistency of <sup>λ</sup>*REF* <sup>g</sup> is also variant.

*Gradual Polymorphism.* Gradual polymorphism is a popular topic. Researchers have been working in this area for a long time. Prior work has focused on two key properties: *relational parametricity* [28] and the *gradual guarantee* [32]. Relational parametricity ensures that all instantiations to a polymorphic value behave uniformly. The gradual guarantee ensures that less dynamic programs behave the same as more static programs.

Satisfying these two properties at once has shown to be problematic. Ahmed *et al.* [3] showed that a naive combination of the unknown type ? and type substitution breaks relational parametricity. They show the problem using a simple expression with two casts. To simplify the presentation, we ignore blame labels. Suppose that *K* ? = <sup>d</sup>λ*x*.λ*y*.*x*e, the dynamically typed constant function, is cast to a polymorphic type:

$$K^\star: \star \Rightarrow \forall X. \forall Y. X \to Y \to X \qquad \qquad K^\star: \star \Rightarrow \forall X. \forall Y. X \to Y \to Y$$

The notation *e* : *A* ⇒ *B*, borrowed from the blame calculus [29], means cast expression *e* from type *A* to type *B*. The constant function *K* ? returns the first argument. Considering relational parametricity, a value of type <sup>∀</sup>*X*. <sup>∀</sup>*Y*. *<sup>X</sup>* <sup>→</sup> *<sup>Y</sup>* <sup>→</sup> *<sup>X</sup>* should be a constant value which always returns the first argument. While a value of type <sup>∀</sup>*X*. <sup>∀</sup>*Y*. *<sup>X</sup>* <sup>→</sup> *<sup>Y</sup>* <sup>→</sup> *<sup>Y</sup>* should return the second argument. Therefore, the first cast succeeds and the second cast should fail. However, if these two casts are applied to the arguments in the usual way employing type substitutions, then we obtain the following:

$$\begin{array}{c} (K^\star : \star \Rightarrow \forall X. \forall Y. X \rightarrow Y \rightarrow X) \mathsf{Int} \mathsf{Int} \, 2 \, 3 \\ \longleftrightarrow ^\*(K^\star : \star \Rightarrow \mathsf{Int} \to \mathsf{Int} \to \mathsf{Int}) \\ \longleftrightarrow ^\*2 \\ (K^\star : \star \Rightarrow \forall X. \forall Y. X \rightarrow Y \rightarrow Y) \mathsf{Int} \, \mathsf{Int} \, 2 \, 3 \\ \longleftrightarrow ^\*(K^\star : \star \Rightarrow \mathsf{Int} \to \mathsf{Int} \to \mathsf{Int}) \\ \longleftrightarrow ^\*2 \end{array}$$

The second cast succeeds and returns the first argument, which breaks parametricity. The reason for this behavior is that, after the type substitution, the polymorphic information is lost. Note that, as we have seen in Section 1, this is exactly how various practical languages (TypeScript, Flow and Typed Racket) behave.

Much of the work on gradual polymorphism aims at addressing the above problem. That is, for the second cast we would like to obtain blame instead of 2, so that parametricity is not violated. While the preservation of parametricity is a worthy goal, it typically requires substantial changes to a calculus to ensure its preservation, since naive direct type substitutions do not work. Furthermore, this also affects proofs, which can become significantly more complicated due to the changes in the calculus. To address this problem a well-known approach, originally proposed by Ahmed et al. [3], is to employ dynamic sealing. With dynamic sealing we do not do the substitution directly but record a fresh variable binding. However, even calculi that satisfy parametricity have to compromise on the important gradual guarantee property, or System F syntax, or be equiped with heavy forms of runtime evidence [37,20]. A thorough discussion of various approaches is given in Section 6.

#### 2.2 Key Ideas

Our key design decision is to give up support for parametricity in exchange for a simpler calculus that is also easier to extend with other important practical features. In particular, in our work we illustrate how to obtain a polymorphic gradually typed calculus, with gradual references and with the gradual guarantee. In contrast, none of the existing gradually polymorphic calculi supports references and the gradual guarantee is only supported with restrictions [20]; or major modifications in the syntax and semantics of the language [24]; or not supported/proved at all [37,3,17].

*A direct semantics with a TDOS.* Our gradually typed calculus λ *G gpr* has a direct semantics by using a (TDOS) [15] approach. In λ *G gpr*, type annotations are *operationally relevant* and they basically play a role similar to casts. Nevertheless, implicit casts should also be enforced for a gradual calculus at runtime. Most previous work makes the implicit casts explicit via the elaboration process. That is the reason why dynamic semantics is not defined directly. We resort to bidirectional typing with inferred (⇒) and checked (⇐) modes. Using the checking mode of bidirectional typing, the consistency (∼) between values and the checked type is checked and enforced via an implicit cast. At compile time, the flexible consistency relation allows more programs to be accepted, while the checking mode signals casts that are needed at runtime. For example, in the typing rule for applications.

$$\frac{\Sigma; \Gamma \vdash e\_1 \implies A\_1 \to A\_2}{\Sigma; \Gamma \vdash e\_1 e\_2 \implies A\_2} \text{ } \frac{\Sigma; \Gamma \vdash e\_2 \iff A\_1}{\text{Type} \cdot \text{APP}} \text{ }$$

The checking mode signals an implicit cast for the argument. The argument *e*<sup>2</sup> is checked to be consistent with the type *A*<sup>1</sup> using the bidirectional subsumption rule:

$$\frac{\Sigma; \Gamma \vdash e \implies B \qquad \Gamma \vdash B \sim A}{\Sigma; \Gamma \vdash e \iff A} \text{ TYP-sm}$$

For instance, (λ*x*. *<sup>x</sup>* : Int <sup>→</sup> Int) (True : ?) type-checks, but at run-time the invalid cast to the value argument (True : ?) is detected and an error is reported.

*Conservativity, no parametricity and direct substitutions.* The λ *G gpr* calculus is a conservative extension of its static counterpart. Notably, our λ *G gpr* is a simple polymorphic calculus, without using mechanisms such as dynamic sealing and evidences. Instead, since parametricity is not a goal, we can simply use direct type substitutions during reduction as follows:

$$((A X. e : A) : \forall X. B) \ C \hookrightarrow e [X \mapsto C] : A [X \mapsto C] : B [X \mapsto C]$$

Our type application rule substitutes type directly unlike in previous work with dynamic sealing where a fresh type name variable is generated and stored in a global or local context. Dynamic sealing takes extra time and space. With a large enough number of type applications, the space consumption may go unbounded.

*Gradual guarantee and references.* Furthermore, λ *G gpr* is mechanically formalized and shown to have the gradual guarantee. Our application of the eager semantics and the choice of value forms for λ *G gpr* simplify the gradual guarantee. To prove the gradual guarantee we need a precision (v) relation. The gradual guarantee theorem needs to ensure that if the more static program does not go wrong, then the less static program should not go wrong as well. The precision relation is used to relate two programs, which have different type information. Type precision compares the amount of static type information for programs and types. A type is more precise than another if it is more static. The unknown type (?) is the least precise type, since we do not have any static information about that type. Let's consider two programs:

$$\begin{aligned} \lambda \mathfrak{x}. 1 &: \mathfrak{lm} \to \mathfrak{l} \mathfrak{m} \\ \lambda \mathfrak{x}. 1 &: \mathfrak{x} \to \mathfrak{x} \end{aligned}$$

The first one is more precise than the second one because the second program is fully dynamic. The value forms of λ *G gpr* are annotated and include terms such as *i* : Int and (λ*x*. *<sup>e</sup>* : *<sup>A</sup>* <sup>→</sup> *<sup>B</sup>*) : *<sup>C</sup>*. The simplicity of the proof of the gradual guarantee is greatly related to the choice of representation of values. In λ *G gpr*, the gradual guarantee theorem can be formalized in a simple way with a lemma similar to a lemma proposed by Garcia et al. [12]. The lemma states that if *e*<sup>1</sup> is more precise than *e*<sup>2</sup> and *e*<sup>1</sup> takes a step to *e* 0 1 then *e*<sup>2</sup> takes a step to *e* 0 2 and *e* 0 1 is more precise than *e* 0 2 . With this lemma, we can infer that two expressions related by precision have the same behavior. Thus, this lemma is enough to obtain the dynamic gradual guarantee. Notably, λ *G gpr* is extended with mutable references using a form of variant consistency [14,38]. This is in contrast to the previously discussed gradually polymorphic calculi where references are not supported.

# 3 The <sup>λ</sup>*g pr* Calculus: Syntax, Typing and Semantics

In this section, we will introduce the <sup>λ</sup>*gpr* calculus, which is a calculus with references and polymorphism. <sup>λ</sup>*gpr* calculus is an extended version of System F with references and is the static calculus that serves as a foundation for the gradual calculus in Section 4.

#### 3.1 Syntax

The syntax of the <sup>λ</sup>*gpr* calculus is shown in Figure 2.


Fig. 2: <sup>λ</sup>*gpr* syntax

*Types.* Meta-variables *<sup>A</sup>*, *<sup>B</sup>* range over types. Types include base types (Int), function types (*<sup>A</sup>* <sup>→</sup> *<sup>B</sup>*), type variables (*X*), polymorphic types (∀*X*. *<sup>A</sup>*), the unit type Unit and reference types Ref *A*, which denotes a reference with type *A*.

*Expressions.* Meta-variables *e* range over expressions. Most of the expressions are standard: variables (*x*), integers (*i*), annotations (*e* : *A*), applications (*e*<sup>1</sup> *e*2), type applications (*e A*), dereferences (!*e*), assignments *e*<sup>1</sup> := *e*2, references (ref *e*), unit (unit), locations *<sup>o</sup>*, lambda abstractions (λ*<sup>x</sup>* : *<sup>A</sup>*. *<sup>e</sup>*) (which are annotated with input type *<sup>A</sup>*), and type abstractions (Λ*X*. *<sup>e</sup>*).

*Values.* Meta-variables *v* range over values. A raw value is either an integer (*i*), a type abstraction (Λ*X*. *<sup>e</sup>*), a lambda abstraction (λ*<sup>x</sup>* : *<sup>A</sup>*. *<sup>e</sup>*), a unit (unit) or a location (*o*).

*Contexts, stores, locations and frames.* The type context Γ tracks the bound variables *<sup>x</sup>* with their types and the bound type variables *<sup>X</sup>*. Typing location Σ tracks the bound locations *<sup>o</sup>* with their types, while the store µ tracks locations with their stored values during the reduction process. Frames (*F*) include applications, type applications, dereferences, assignments and references.

#### 3.2 Type System

Before introducing the type system, we show the well-formedness of types at the top of Figure 3. The well-formedness of types ensures that there are no free type variables and that each type variable is bound in the contexts.

*Typing relation.* The typing relation of <sup>λ</sup>*gpr* is shown at the bottom of Figure 3. The type system essentially includes the usual System F rules, except that they also propagate the location typing context (Σ). Reference locations *<sup>o</sup>* are stored in the location typing context Σ (rule styp-loc). The bound type of locations indicates the type of stored values. For instance, *o* points to 1 stored in a memory cell. The integer type for 1 is tracked by the location *<sup>o</sup>* in the location typing context Σ. Other rules related to references such as assignments (rule styp-assign), references (rule styp-ref) and dereferences (rule stypderef) are standard. Annotation expressions (*e* : *A*) are not necessary for the static


Fig. 3: The type system of <sup>λ</sup>*gpr* calculus.

system where the annotated types are syntactically equal (rule styp-anno), but they will play an important role in the gradual system and are included here. Definition <sup>1</sup> defines well-formed stores (µ) with respect to the typing locations Σ, using

$$\text{Definition 1 }\text{ (Well-formedness of the store with respect to }\Sigma\text{)}.$$

$$\Sigma \vdash \mu \equiv \text{if } dom(\mu) = dom(\Sigma) \text{ and } \Sigma; \vdash \mu(o) : \Sigma(o), \text{ for every } o \in \mu$$

A store is well-formed with the typing location if the store and the typing location contain the same domains. For each location, which is in the store, the bounded value µ(*o*) can be inferred with the type bound in the typing location (Σ(*o*)).

#### 3.3 Dynamic Semantics

the typing relation:

The operational semantics for the <sup>λ</sup>*gpr* calculus is shown in Figure <sup>4</sup> (we ignore the gray parts for now). µ; *<sup>e</sup>* ,<sup>→</sup> µ 0 ; *e* 0 represents the reduction rules, which states that *e* with store µ reduces to *<sup>e</sup>* <sup>0</sup> with the updated store µ 0 . The reduction rules of <sup>λ</sup>*gpr* are

<sup>µ</sup>; *<sup>e</sup>* ,→*<sup>s</sup>* <sup>µ</sup> 0 ; *e* 0 *(Operational semantics)* step-eval <sup>µ</sup>; *<sup>e</sup>* ,→*<sup>s</sup>* <sup>µ</sup> 0 ; *e* 0 <sup>µ</sup>; *<sup>F</sup>*[*e*] ,→*<sup>s</sup>* <sup>µ</sup> 0 ; *F*[*e* 0 ] step-annov <sup>µ</sup>; *<sup>v</sup>* : *<sup>A</sup>* : *<sup>A</sup>* ,→*<sup>s</sup>* <sup>µ</sup>; *<sup>v</sup>* : *<sup>A</sup>* step-assign <sup>µ</sup>; *<sup>o</sup>* :<sup>=</sup> *<sup>v</sup>* ,→*<sup>s</sup>* <sup>µ</sup>[*<sup>o</sup>* 7→ *<sup>v</sup>*]; unit step-tap <sup>µ</sup>; ((Λ*X*. *<sup>e</sup>*) : <sup>∀</sup>*X*. *<sup>A</sup>* ) *<sup>A</sup>* ,→*<sup>s</sup>* <sup>µ</sup>; (*e*[*<sup>X</sup>* 7→ *<sup>A</sup>*]) : (*A*[*<sup>X</sup>* 7→ *<sup>A</sup>*]) step-deref *<sup>o</sup>* <sup>=</sup> *<sup>v</sup>* <sup>∈</sup> µ <sup>µ</sup>; !*<sup>o</sup>* ,→*<sup>s</sup>* <sup>µ</sup>; *<sup>v</sup>* : *<sup>A</sup>* step-beta <sup>µ</sup>; ((λ*<sup>x</sup>* : *<sup>A</sup>*. *<sup>e</sup>*) : *<sup>A</sup>* <sup>→</sup> *<sup>B</sup>* ) *<sup>v</sup>* ,→*<sup>s</sup>* <sup>µ</sup>; *<sup>e</sup>*[*<sup>x</sup>* 7→ *<sup>v</sup>*] : *<sup>B</sup>* : *<sup>B</sup>* step-refv *<sup>o</sup>* <sup>&</sup>lt; µ <sup>µ</sup>; ref *<sup>v</sup>* ,→*<sup>s</sup>* µ, *<sup>o</sup>* <sup>=</sup> *<sup>v</sup>*; *<sup>o</sup>*

Fig. 4: Reduction rules for λ*gpr*.

straightforward. A reference value is bound in the store by a fresh location as shown in rule step-refv. The dereference rule extracts the bound value of the location in the store (rule step-deref). Rule step-eval evaluates the frames. Let's see how the example *<sup>o</sup>*<sup>1</sup> :<sup>=</sup> (Λ*X*. (λ*<sup>x</sup>* : *<sup>X</sup>*. *<sup>x</sup>*) !*o*2) Int with the existing store *<sup>o</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup>, *<sup>o</sup>*<sup>2</sup> <sup>=</sup> 2 reduces. 2 is read from store *<sup>o</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup>, *<sup>o</sup>*<sup>2</sup> <sup>=</sup> 2. After the type substitution, 2 is substituted into the lambda. Then 2 is used to update the store pointed by *o*1. Finally, the store becomes *<sup>o</sup>*<sup>1</sup> <sup>=</sup> <sup>2</sup>, *<sup>o</sup>*<sup>2</sup> <sup>=</sup> 2. The detailed steps are as follows:

$$\begin{aligned} o\_1 &= 1, o\_2 = 2; o\_1 := (\lambda X. \ (\lambda x : X. x) \ ? o\_2) \ \mathsf{Int} \\ \longleftrightarrow & \{ \text{by rule } \text{STEP-Eval}, \text{ rule } \text{STEP-DEF} \} \\ o\_1 &= 1, o\_2 = 2; o\_1 := (\lambda X. \ (\lambda x : X. x) \ 2) \ \mathsf{Int} \\ \longleftrightarrow & \{ \text{by rule } \text{STEP-var} \} \\ o\_1 &= 1, o\_2 = 2; o\_1 := (\lambda x : \text{Int}. x) \ 2 \\ \longleftrightarrow & \{ \text{by rule } \text{STEP-BETA} \} \\ o\_1 &= 1, o\_2 = 2; o\_1 := 2 \\ \longleftrightarrow & \{ \text{by rule } \text{STEP-Assgen} \} \\ o\_1 &= 2, o\_2 = 2; \text{unit} \end{aligned}$$

Theorem <sup>1</sup> shows that the <sup>λ</sup>*gpr* calculus is deterministic:

Theorem 1 (Determinism of <sup>λ</sup>*gpr*). *If* <sup>Σ</sup>; · `*<sup>s</sup> <sup>e</sup>* : *A,* <sup>Σ</sup> ` <sup>µ</sup>*,* <sup>µ</sup>; *<sup>e</sup>* ,→*<sup>s</sup>* <sup>µ</sup><sup>1</sup>; *<sup>e</sup>*<sup>1</sup> *and* <sup>µ</sup>; *<sup>e</sup>* ,→*<sup>s</sup>* <sup>µ</sup><sup>2</sup>; *<sup>e</sup>*<sup>2</sup> *then e*<sup>1</sup> <sup>=</sup> *<sup>e</sup>*<sup>2</sup> *and* <sup>µ</sup><sup>1</sup> <sup>=</sup> <sup>µ</sup><sup>2</sup>*.*

Furthermore, the preservation Theorem <sup>2</sup> and progress Theorem <sup>3</sup> of <sup>λ</sup>*gpr* calculus are shown below:

Theorem 2 (Type Preservation of <sup>λ</sup>*gpr*). *If* <sup>Σ</sup>; · `*<sup>s</sup> <sup>e</sup>* : *A,* <sup>Σ</sup> ` <sup>µ</sup> *and* <sup>µ</sup>; *<sup>e</sup>* ,→*<sup>s</sup>* <sup>µ</sup> 0 ; *e* 0 *then* Σ 0 ; · `*<sup>s</sup> e* 0 : *A,* Σ 0 ` µ <sup>0</sup> *and* Σ <sup>0</sup> <sup>⊇</sup> Σ*.*

Theorem 3 (Progress of <sup>λ</sup>*gpr*). *If* <sup>Σ</sup>; · `*<sup>s</sup> <sup>e</sup>* : *A then e is a value or* <sup>∃</sup>*<sup>e</sup>* 0µ 0 *,* <sup>µ</sup>; *<sup>e</sup>* ,→*<sup>s</sup>* <sup>µ</sup> 0 ; *e* 0 *.*


Fig. 5: Bidirectional typing for the <sup>λ</sup>*gpr* calculus.

#### 3.4 Bidirectional Typing

We also present a set of bidirectional typing rules (shown in Figure 5) for λ*gpr*. Although bidirectional typing is not essential for λ*gpr*, it is used later for the gradual typing criteria proofs. The typing judgment is represented as Σ; Γ ` *<sup>e</sup>* <sup>⇔</sup> *<sup>A</sup>*. The expression *<sup>e</sup>* is inferred (⇒) or checked (⇐) by type *<sup>A</sup>* under the typing context Γ and location typing context Σ. Typing modes (⇔) contain the inference mode (⇒) and checking mode (⇐), which are shown at the top of Figure 5. One extra rule is rule sty-eq, which switches modes. We proved that the two type systems are equivalent:

Lemma 1 (Typing Equivalence for <sup>λ</sup>*gpr*). <sup>Σ</sup>; <sup>Γ</sup> `*<sup>s</sup> <sup>e</sup>* : *A i*ff <sup>Σ</sup>; <sup>Γ</sup> *<sup>s</sup> <sup>e</sup>* <sup>⇔</sup> *A.*

#### 4 The λ *G g pr* Calculus

This section introduces the λ *G gpr* calculus, which gradualizes the <sup>λ</sup>*gpr* calculus. Normally, a gradually typed lambda calculus (GTLC) does not define the operational semantics directly, but is elaborated to a cast calculus. λ *G gpr* instead defines the dynamic semantics directly using the TDOS approach [15]. λ *G gpr* is proved to be type sound and it has a gradual guarantee. The calculus does not have parametricity, enabling simplifications

#### Syntax


Γ ` *<sup>A</sup>* <sup>∼</sup> *<sup>B</sup> (Consistency)*



Fig. 6: λ *G gpr* syntax and consistency.

in the calculus, and the addition of features such as gradual references, which none of the previous gradual calculi with polymorphism support.

#### 4.1 Static Semantics

*Syntax, type well-formedness and consistency.* Figure 6 shows the syntax and consistency of the λ *G gpr* calculus. The gray parts are the same as <sup>λ</sup>*gpr*. The <sup>λ</sup> *G gpr* calculus extends types with the unknown type ? with respect to λ*gpr*. Because of the power of the unknown type ?, dynamic type checking is required and run-time errors may be raised. Therefore, in addition to expressions, λ *G gpr* has the run-time error blame. Because of the run-time checking requirement for the gradual typing system, we need annotations for type abstractions and lambda abstractions. Furthermore, due to the imprecision of the unknown type ?, values are also annotated. Otherwise, examples such as 1 : ? are troublesome. Because of the value forms, annotations are not included in frames, unlike in the <sup>λ</sup>*gpr* calculus. We will explain the details later.

Well-formed types are extended with the following rule for the unknown type ?:

$$\Gamma \vdash \star$$

Notably, instead of syntactic equality, a more general relation called consistency (Γ ` *<sup>A</sup>* <sup>∼</sup> *<sup>B</sup>*) is defined in λ *G gpr*. Every well-formed type is consistent with itself. The unknown


Fig. 7: The type system for the λ *G gpr* calculus.

type is consistent with any other well-formed type. Structural types such as functions, references and polymorphic types are consistent if their type sub-components are consistent. Note that for two reference types, consistency is variant: if *A* and *B* are consistent then Ref *A* and Ref *B* are consistent. Unlike invariant consistency [31], type *A* and *B* do not have to be the same. As usual, consistency is reflexive and symmetric, but not transitive. We use the following abbreviation for consistency: *A* ∼ *B* ≡ · ` *A* ∼ *B*.

*Typing relation.* Bidirectional typing is used to design the type system. Bidirectional typing is not essential for <sup>λ</sup>*gpr* but it is necessary for <sup>λ</sup> *G gpr*. Annotation expressions (*e* : *A*) and the checking mode (⇐) signal the use of casts (explicitly or implicitly) at run-time.

The typing rules of the λ *G gpr* calculus are shown in Figure 7. They are almost the same as λ*gpr*'s type system. For rule <sup>T</sup>yp-app, rule <sup>T</sup>yp-tapp, rule <sup>T</sup>yp-assign and rule <sup>T</sup>ypderef, the unknown type ? can be matched with, respectively, a dynamic function type (? <sup>→</sup> ?), a dynamic polymorphic type (∀*X*. ?) and a dynamic reference type (Ref ?). In a system with gradual typing and the unknown type ? we always have to consider cases where the type may be unknown. For instance in an application *e*<sup>1</sup> *e*2, *e*<sup>1</sup> can infer a function type as usual, but it can also infer type ? and still be well-typed. So, a matching function (*A* B *B*) is needed to account for both possibilities. The table at the bottom of Figure 7 shows the definition of the matching functions *A* B *B*. Note that we overload the notation, but there are 3 different matching functions, in each column of the table, that are employed by the rules correspondingly. For example, rule Typ-deref employs the matching function in the third column of the table. The first row in the table depicts the form of the matching function, while the other two rows give its definition.

The checking mode rule Typ-sim is generalized to check if the inferred type *A* and checked type *B* are consistent. Note that rule Typ-sim is the only rule in the checked mode and, as such, does not overlap with anything else. Moreover, all the rules in the inference mode are syntax directed. Therefore, the rules are basically directly implementable, as usual for bidirectional type-checking rules. Note that in λ *G gpr* annotation expressions combined with consistency play an important role, where more programs are allowed. For instance, (λ*x*. ((*<sup>x</sup>* : ?) 1) : Bool <sup>→</sup> ?) True is accepted, but raises a blame error at run-time. Note that dynamically typed lambdas λ*x*.*<sup>e</sup>* are syntactic sugar for λ*x*.*<sup>e</sup>* : ? <sup>→</sup> ?. The use of this syntactic sugar enables us to encode the dynamically typed lambda calculus (DTLC) [4] easily in λ *G gpr*.

Definition 2 shows dynamic type checking for raw and annotated values, which is done at run-time. Dynamic type checking for values exploits the annotations that are present at run-time, and does not make use of the typing relation. Dynamic type checking is essentially a constant time operation, with little cost (note that the function is not recursive).

Definition 2 (Dynamic type). <sup>|</sup>*u*|µ <sup>=</sup> *A and* <sup>|</sup>*v*|µ <sup>=</sup> *A denote the dynamic type of the raw and annotated values.*

$$\begin{aligned} |i|\_\mu &= \text{Int} \\ |(\lambda \mathbf{x}.e: A \to B)|\_\mu &= A \to B \\ |(\lambda X.e: A)|\_\mu &= \forall X.A \\ |\mathsf{unit}|\_\mu &= \mathsf{U} \mathsf{nit} \\ |o|\_\mu &= \mathsf{Re} f |\mathsf{v}|\_\mu \quad \text{when } o = \mathsf{v} \in \mu \\ |(u:A)|\_\mu &= A \end{aligned}$$

<sup>|</sup>*u*|µ <sup>=</sup> *<sup>A</sup>* states that the dynamic type of the raw value *<sup>u</sup>* is *<sup>A</sup>* under store <sup>µ</sup>. Notably, for locations *o*, the dynamic type is defined by the dynamic type of the bounded values in the store. Other rules are straightforward. Lemma 2 shows that if a raw value can be inferred with type *A*, then its dynamic type is type *A* as well.

Lemma 2 (Synthesis of Dynamic Types). *For any raw value u, if* Σ ` µ *and* Σ; · ` *<sup>u</sup>* <sup>⇒</sup> *A then* <sup>|</sup>*u*|µ <sup>=</sup> *A.*

As in λ*gpr*, a term typed using the inference mode is guaranteed to infer a unique type. In addition, Lemma 3 shows that each well-typed term can be checked.

Lemma 3 (Synthesis principality). *If* Σ; Γ ` *<sup>e</sup>* <sup>⇒</sup> *A then exists B,* Σ; Γ ` *<sup>e</sup>* ⇐ *B and* Γ ` *<sup>A</sup>* <sup>∼</sup> *B.*


;*r (Casting for values)*

$$\begin{array}{c} \text{CASTING-SM} \\ \mu | \mu \smile B \\ \hline \mu; \mu : A \hookrightarrow\_{B} \mu; \mu : B \end{array} \qquad\qquad\qquad \begin{array}{c} \text{CASTING-NSIM} \\ \neg | \mu | \mu \smile B \\ \hline \hline \mu; \mu : A \hookrightarrow\_{B} \mu; \text{blämmes} \end{array}$$

<sup>µ</sup>; *<sup>v</sup>* ,→*<sup>B</sup>*,*<sup>A</sup>* <sup>µ</sup> 0

;*r (Double casting)*


Fig. 8: Casting for values

#### 4.2 Dynamic Semantics

The dynamic semantics contains two parts. The first part is casting, which casts a value to another value with a target type. In casting the dynamic type of the value is the source type. The second part is the reduction rules.

*Casting.* Figure <sup>8</sup> shows the casting rules of the λ *G gpr* calculus. <sup>µ</sup>; *<sup>v</sup>* ,→*<sup>A</sup>* <sup>µ</sup>;*<sup>r</sup>* represents casting values *<sup>v</sup>* by type *<sup>A</sup>* under store µ. The dynamic type of the raw values *u* is checked to be consistent with type *A* or not. If two types are consistent, then the intermediate type can be removed and the raw values are annotated with target types. Otherwise, a run-time error is raised. For example when 1 : ? is cast by type Bool, the dynamic type of 1 is Int, which is not consistent with Bool, and blame is raised. While in 1 : ? cast by type Int, the type Int is consistent with type Int. Thus, type ? is erased and 1 is annotated with type Int. Since a location *o* is a raw value, if we want to obtain the dynamic type of the location, we should obtain it from the store µ. Therefore, casting uses the store. Casting by two types is shown at the bottom of Figure 8. It simply casts the types one by one, using the basic casting relation.

*Reduction.* The reduction rules of λ *G gpr* calculus are shown in Figure 9. Raw values are reduced to become values, which are annotated by the dynamic type of the raw values with rule step-u. Due to this rule, annotations are not included in the frame. Annotated expressions are further dealt by rule step-anno and rule step-annop. From the typing rules of rules <sup>T</sup>yp-app, <sup>T</sup>yp-tapp, <sup>T</sup>yp-assign, and <sup>T</sup>yp-deref, type ? is allowed to match, respectively, a dynamic function, a polymorphic function or a reference type. Moreover, we know that ? is consistent with any type. Therefore, we should check whether the internal values cannot match with the wanted type structure. For example, ill-formed applications ((1 : ?) 2) where the internal value (1) is not an lambda abstraction. There are similar examples for type applications and assignments: (1 : ?) Bool and (True : ?) :<sup>=</sup> 2 where 1 is not a type abstraction and True is not a location. Using

;*r (Operational semantics)*


vstep-beta





$$\begin{array}{c} \cdots \longrightarrow \longrightarrow \\ \longrightarrow \text{value } e:A \\ \mu; e \hookrightarrow \mu'; e' \\ \hline \mu; e:A \hookrightarrow \mu'; e':A \end{array}$$

Fig. 9: Reduction rules for λ *G gpr*.

rules vstep-betad, vstep-tapd, vstep-derefp, and vstep-assignd, we cast the value to the corresponding dynamic types and filter out programs with errors. To apply a value to a functional value (rules vstep-beta and vstep-betap), the argument type must be consistent with function input types *A*2. Moreover, the expected substituted value type is *A*1. Thus, the argument value should be cast by *A*<sup>2</sup> and *A*1, which may return a blame error. To preserve the type, the substituted body is annotated with *B*<sup>1</sup> and *B*2. When a value *v* is annotated with a type *A*, the type of the value must be consistent with type *A*, and run-time checking is needed to validate consistency (rule vstep-annov). A reference value ref *v* is bound in the store with a fresh location *o* (rule vstep-refv). To obtain a value from the store by the location, from the last expression we use rule vstep-deref. Note that in the typing rule for references:

$$\frac{\Sigma; \vdash o: A\_1 \implies A\_1 \qquad A\_1 \rhd \mathsf{Ref} \, A}{\Sigma; \vdash !(o: A\_1) \Rightarrow A} \, \mathsf{TYP-DEEF}$$

The expected type is *A* but the bound value type is consistent with *A*. Thus we annotate *v* using type *A*. When assigning a value to replace the bound value in the reference using rules vstep-assign and vstep-assignp :

$$\frac{A \rhd \text{Ref } A\_2}{\begin{array}{c} \\ \end{array} \begin{array}{c} \Sigma; \dashv \vdash o : A \Rightarrow A \end{array} \begin{array}{c} \Sigma; \dashv \vdash v\_2 \Leftarrow \begin{array}{c} A \\ \end{array} \\ \end{array}\_{\begin{array}{c} \Sigma; \dashv \vdash v\_2 \end{array} \begin{array}{c} \blacksquare \text{S\\_list} \end{array}}{\begin{array}{c} \blacksquare \text{S\\_list} \end{array}} \begin{array}{c} \text{T\\_N\text{-S\\_list}\text{-S} \end{array}}$$

The bound value by location *o* has type *A*1, while the type of *v*<sup>2</sup> is consistent with type *A*<sup>2</sup> and *A*<sup>2</sup> is consistent with *A*1. The expected type to be replaced is type *A*1, therefore *v*<sup>2</sup> is cast by type *A*<sup>1</sup> and *A*2. Note that the cast result can be blamed. If a type is applied to a polymorphic value, from the last expression (rule vstep-tap):

$$\begin{array}{llll} B \rhd \forall X. B\_2 & \Sigma; \vdash (\land X. e: A) : B \Rightarrow B & \\ \hline \Sigma; \vdash ((\land X. e: A) : B) \, C \Rightarrow B\_2 [X \mapsto C] \end{array} \text{TYP-TAPP}$$

The expected type is (*B*2[*X* 7→ *C*]) but the substituted expression (*e*[*X* 7→ *C*] : *A*[*X* 7→ *C*]) has type (*A*[*X* 7→ *C*]), so it is annotated with type (*B*2[*X* 7→ *C*]).

*Properties of* λ *G gpr.* λ *G gpr* is deterministic (Theorem 4) and type sound (Theorem 5 and Theorem 6).

Theorem 4 (Determinism of λ *G gpr*). *If* <sup>Σ</sup>; · ` *<sup>e</sup>* <sup>⇔</sup> *A,* <sup>µ</sup>; *<sup>e</sup>* ,<sup>→</sup> <sup>µ</sup>1;*r*<sup>1</sup> *and* <sup>µ</sup>; *<sup>e</sup>* ,<sup>→</sup> <sup>µ</sup>2;*r*<sup>2</sup> *then r*<sup>1</sup> <sup>=</sup> *<sup>r</sup>*<sup>2</sup> *and* <sup>µ</sup><sup>1</sup> <sup>=</sup> <sup>µ</sup><sup>2</sup>*.*

Theorem 5 (Type Preservation of λ *G gpr*). *If* Σ; · ` *<sup>e</sup>* <sup>⇔</sup> *A,* Σ ` µ*, and* µ; *<sup>e</sup>* ,<sup>→</sup> µ 0 ; *e* 0 *then* Σ 0 ; · ` *e* <sup>0</sup> <sup>⇔</sup> *A,* Σ 0 ` µ <sup>0</sup> *and* Σ <sup>0</sup> <sup>⊇</sup> Σ*.*

Theorem 6 (Progress of λ *G gpr*). *If* Σ; · ` *<sup>e</sup>* <sup>⇔</sup> *A then e is a value or* <sup>∃</sup>*<sup>r</sup>* µ 0 *,* µ; *<sup>e</sup>* ,<sup>→</sup> µ 0 ;*r.*

#### 4.3 Gradual Typing Criteria

Siek *et al.* [31,32] proposed a set of criteria for gradual typing system. At the end of the spectrum, a fully annotated gradually typed program should behave as a statically typed program. Conversely, a gradually typed program without annotations should behave as a dynamic program. Siek *et al.* proposed the gradual guarantee, which states that having annotations that are more/less precise should not change the behavior of the programs. Here we show that λ *G gpr* has the gradual guarantee.

To prove the gradual guarantee, we define the precision for types, expressions and stores. At the top of Figure 10 is type precision *A* v *B*, which states that type *A* is more precise than *<sup>B</sup>*. The unknown type ? is less precise than any other types. Each type is more precise than itself. The precision of functions, polymorphic functions and

$$\begin{array}{|c|c|c|c|}\hline \cr \text{\$\texttt{pr-NNF}\$} & \text{\$\texttt{pr-Nx}\$} & \text{\$\texttt{pr-Nx}\$} & \text{\$\texttt{pr-Nx}\$} & \text{\$\texttt{pr-Nx}\$} & \text{\$\texttt{pr-Nx}\$} \\ \hline \\ \texttt{pr-Nx} & \text{\$\texttt{pr-Nx}\$} & \text{\$\texttt{pr-Nx}\$} & \text{\$\texttt{pr-Nx}\$} & \text{\$A\_1 \to B\_1\$} & A\_1 \to B\_1 & A\_2 \to B\_2 \\ \hline \\ \texttt{\texttt{pr-Nx}\$\texttt{old}} & \text{\$\texttt{pr-Nx}\$} & \text{\$\texttt{pr-Nx}\$ } & A\_1 \to A\_2 \, \texttt{in} & B\_1 \to B\_2 \\ \hline \\ \texttt{\texttt{\texttt{\\_}\\_1\\_e\\_e\\_e\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_$$

#### Fig. 10: Precision Relation.

reference types holds, if the precision of their sub-components holds. Note that the precision of function types is "covariant" in the argument types since to compare the precision of the two programs:

$$\begin{aligned} \lambda \ge 1 &: \mathsf{Int} \to \mathsf{Int} \\ \lambda \ge 1 &: \mathsf{\star} \to \mathsf{Int} \end{aligned}$$

we should just say that the first one is more precise than the second one because the input type of the second one is fully dynamic. Expression precision is shown in the middle of Figure 10. The rules can mostly be derived from the type precision. Each expression is in a precision relation with itself. Structural expressions are in a precision relation if their sub-expressions are related. Lastly, store precision, shown at the bottom of Figure 10, shows that precision holds if the precision of values in the store holds.


Fig. 11: Reduction rules for λ*gpr*.

*Static criteria.* We show that the full static type system of λ *G gpr* is equivalent to the <sup>λ</sup>*gpr* calculus (Theorem 7). We use *<sup>s</sup>* to denote a relation from the static system in case of ambiguity. Theorem <sup>8</sup> shows the static gradual guarantee of λ *G gpr*. If a more precise program is well-typed then a less precise program should be well-typed with a less precise type.

Theorem 7 (Equivalence for <sup>λ</sup>*gpr* (statics)). *If* ·; · *<sup>s</sup> <sup>e</sup>* <sup>⇔</sup> *A if and only if* ·; · ` *<sup>e</sup>* <sup>⇔</sup> *A.*

Theorem 8 (Static Gradual Guarantee). *If e*<sup>1</sup> v *e*2*,* ·; · ` *e*<sup>1</sup> ⇔ *A then* ·; · ` *e*<sup>2</sup> ⇔ *B and A* v *B.*

*Dynamic criteria.* Theorem <sup>9</sup> says that fully static programs of λ *G gpr* calculus behaves in the same as the <sup>λ</sup>*gpr* at run-time. To make the proofs easier, the reduction rules of <sup>λ</sup>*gpr* calculus have extra annotations to follow λ *G gpr* (we denoted as *s*∗). It means that there are extra identical annotations, as shown in the gray parts of Figure 4. However, these annotations are identical and they can be removed without affecting the final reduction result. In addition, as in λ *G gpr*: values have annotations; raw values should step to be annotated values; and annotations are not included in Frames. This requires a few extra rules, which are shown in Figure 11.

Notably, λ *G gpr* has the dynamic gradual guarantee (Theorem 10). The proof is simple in comparison to the original proof by Siek et al. [32]. This simple theorem is formalized following the work of Garcia *et al.* [12]. It says that if a more precise program with a more precise store can reduce, then the less precise program with a less precise store can also reduce. Furthermore, their resulting programs and stores should keep the precision relation.

Theorem 9 (Equivalence for <sup>λ</sup>*gpr* (dynamic)). ∀ ·; · *<sup>s</sup> <sup>e</sup>* <sup>⇔</sup> *A,*

– *If* <sup>µ</sup>; *<sup>e</sup>* ,→*s*<sup>∗</sup> <sup>µ</sup> 0 ; *e* 0 *then* µ; *<sup>e</sup>* ,<sup>→</sup> µ 0 ; *e* 0 *.* – *If* µ; *<sup>e</sup>* ,<sup>→</sup> µ 0 ; *e* 0 *then* <sup>µ</sup>; *<sup>e</sup>* ,→*s*<sup>∗</sup> <sup>µ</sup> 0 ; *e* 0 *.*

Theorem 10 (Dynamic Gradual Guarantee). *If e*<sup>1</sup> <sup>v</sup> *<sup>e</sup>*<sup>2</sup> *,* <sup>µ</sup><sup>1</sup> <sup>v</sup> <sup>µ</sup>2*,* ·; · ` *<sup>e</sup>*<sup>1</sup> <sup>⇔</sup> *A,* ·; · ` *<sup>e</sup>*<sup>2</sup> <sup>⇔</sup> *B and* <sup>µ</sup>1; *<sup>e</sup>*<sup>1</sup> ,<sup>→</sup> <sup>µ</sup> 0 1 ; *e* 0 1 *then there exists e*<sup>0</sup> 2 *and* µ 0 2 *such that* <sup>µ</sup>2; *<sup>e</sup>*<sup>2</sup> ,<sup>→</sup> <sup>µ</sup> 0 2 ; *e* 0 2 *, e*<sup>0</sup> 1 v *e* 0 2 *and* µ 0 1 v µ 0 2 *.*

# 5 Discussion

In this section, we briefly discuss alternative designs and possible extensions.

*Preserving relational parametricity.* An alternative design is to have a directed semantics gradual polymorphism calculi, which preserves parametricity. We employ the eager semantics similar to the AGT methodology, which is applied in the GSF calculus. Toro *et al.* [37] analyzed the following example to show how parametricity is broken by the naive use of the dynamic sealing in the eager semantics:

$$(\Lambda X.(\lambda x : X. \text{let } y : \star = x \text{ in } \text{let } z : \star = y \text{ in } z + 1)) \text{ lnt } 1$$

The polymorphic function with type (∀*X*. *<sup>X</sup>* <sup>→</sup> ?) breaks parametricity, which should be detected at run-time and raise an error. However, the application of the function reduces to 2. A fresh name variable α is generated and is bounded to the type Int. Variable *<sup>x</sup>* to *<sup>y</sup>* is flowing from type Int to type α; *<sup>y</sup>* to *<sup>z</sup>* is flowing from type ? to type ?; and *<sup>x</sup>* to *<sup>z</sup>* is flowing from Int to ?. Any of these type flows are safe. Thus the reason for the loss of parametricity is related to the loss of precise type information. Consequently, dynamic sealing is not enough to enforce relational parametricity. For the above example, GSF detects the error by the refining evidences such as (hα *E*1 , α*<sup>E</sup>*<sup>2</sup> <sup>i</sup>). Importantly in the type flow from *<sup>y</sup>* to *<sup>z</sup>*, more precise types (Int and α *Int*) instead of ? and ? are obtained, so when moving from *<sup>x</sup>* to *<sup>z</sup>* the type changes from Int to α *Int* . When doing the addition, the run-time error is detected since the flow from α *Int* to Int is not defined. A potential approach for us is to use tracked types (*A* <*B*1,*B*2> ), which are similar to the refined evidences in the GSF calculus. Because λ *G gpr* is a source language, we do not have evidences, thus a possible approach is to record information in types. For the above example, tracked types can track the unknown type with more precise types from *<sup>y</sup>* to *<sup>z</sup>* to be Int and α *Int* which is ? (Int,α*Int*) and then from *<sup>x</sup>* to *<sup>z</sup>* to be ? (Int,α*Int*) as the refined evidences and a run-time error is detected when doing the addition.

*A space-e*ffi*cient gradual polymorphic calculus.* Ozaki *et al.* [27] explored the space efficiency problem in the gradual polymorphic calculus. They extended the coercion calculus (λ*C*) [29] with parametric polymorphism (called λ*<sup>C</sup>* ∀ ). Dynamic sealing was applied in λ*<sup>C</sup>* ∀ to enforce relational parametricity. Consequently, a sequence of coercions is allowed and they showed that it cannot be normalized to a smaller coercion. In other words, the size of sequences is unbounded. Notably, they stated and proved that λ*<sup>C</sup>* ∀ cannot be space-efficient when dynamic sealing is supported. Furthermore, they conjectured that the gradual polymorphic calculus with dynamic sealing cannot become space-efficient. Our λ *G gpr* calculus substitutes types directly, as the traditional semantics without employing dynamic sealing. Moreover, the eager semantics is applied. Thus we believe that it is possible for our λ *G gpr* calculus to be a space-efficient gradual polymorphic calculus. Two tentative and promising rules are as follows:

$$\frac{A \sim C}{e:A:B:C \hookrightarrow e:A:C} \quad \frac{\neg A \sim C}{e:A:B:C \hookrightarrow \mathsf{blame}}$$

With the above two rules, annotations are removed or an error is raised, to achieve the space-efficient goal. Surprisingly, with these two rules, it seems possible to have a space-efficient gradual references calculus naturally. We intend to explore this in the future.

*Implicit polymorphic references.* Implicit (higher-rank) polymorphism [10,26,19] is pervasive in theoretic and practical programming languages. Existing gradual polymorphic calculi are mainly explicitly polymorphic. One exception is the work of Xie *et al.* [41]. Explicit polymorphism means that polymorphic types are not related to any of its instantiated types but in implicit polymorphism, they are related. Xie *et al.* [41] designed a source gradual implicit polymorphism calculus with consistent subtyping but their dynamic semantics is defined by translating to the well-known polymorphic blame calculus (λ*<sup>B</sup>* ∀ ) [3] without the proof of the dynamic gradual guarantee. A possible extension of Xie et al.'s work is to support implicit polymorphism with a direct dynamic semantics, and to explore the dynamic gradual guarantee and parametricity properties. However, it is well-known that a naive combination of implicit polymorphism and references lead to an unsound language. A possible solution is to limit polymorphism to syntactic let-bound values as adopted by Standard ML [40].

*Alternative forms of values.* In our calculus, all values are annotated, such as 1 : Int or (λ*x*. *<sup>x</sup>* : Int <sup>→</sup> Int) : Int <sup>→</sup> Int. This introduces some overhead as some annotations are redundant. We can have an alternative and workable form of values as follows:

$$\text{iv ::=} u \mid u: \star \mid (\lambda X. e: A): \forall X. B \mid (\lambda \mathbf{x}. e: A\_1 \to B\_1): A\_2 \to B\_2$$

The above value form removes redundant annotations such as integers (1 : Int). This is good for performance, but it would make the proof of dynamic gradual guarantee harder. However, the resulting calculus with fewer annotations should have an equivalent semantics to our calculus, and would be a better candidate for guiding an implementation.

# 6 Related Work

*Gradual typing.* Gradual typing is a term coined by Siek *et al.* [31]. The unknown type ?, which we represent as ?, is the new notion introduced to a gradual type system to integrate dynamic and static typing. By using the unknown type ?, equality on types is lifted to consistency. Any type is consistent with type ?. Therefore, run-time type checking is needed for a gradually typed lambda calculus. Traditionally, the dynamic semantics of a gradual language is defined by elaborating to a target language, which includes cast calculi [39,34,29,11,3] and coercion calculi [13,14,30,29,27].

Garcia *et al.* [12] proposed the abstracting gradual typing (AGT) approach, which allows for deriving a gradual type system by lifting the static type system. They argue about the weakness of elaborating to a target language, and did not resort to a target language in their calculus by using intrinsic terms. Our λ *G gpr* defines the dynamic semantics directly without using intrinsic terms, but employing instead an approach based on type-directed operational semantics (TDOS). Type directed operational semantics (TDOS) was proposed by Huang *et al.* [15] to design calculi with the merge operator and intersection types. Ye *et al.* [42] explored the use of the TDOS in gradual typing. In TDOS, type annotations are relevant at runtime and can affect the semantics, unlike many traditional calculi where types are not runtime relevant. With a TDOS we can design a gradually typed calculus without elaboration to a cast calculus, since the semantics can be given directly. Our λ *G gpr* employs the eager semantics for higher-order values following an approach similar to AGT. Ye *et al.* only consider a TDOS for a simply typed, purely functional language. Our work shows that the TDOS approach can be extended to important features, such as polymorphism and references.

*Gradual typing with references.* Many languages with static and dynamic typing, employing some form of optional typing, support references. These include Flow [8], Dart [6] and TypeScript [5]. However for optional typing, the run-time checking is not performed for fully dynamic programs, leading to unsoundness with respect to the static type system. In the work of Siek *et al.* [31], he already considered mutable references, but in a very simple setting without annotation expressions. Furthermore, the gradually typed lambda calculus is elaborated to a target language to define the dynamic semantics. Herman *et al.* [14] designed a coercion calculus with references, which is space efficient. A gradualizer, introduced by Cimini and Siek [9], can derive a gradual static type system and cast insertion with references systematically. Toro *et al.* [38] designed source gradual typing system with references <sup>λ</sup>*REF* <sup>g</sup> and a corresponding target language λ *REF* <sup>g</sup> using the Abstracting Gradual Typing (AGT) methodology. They designed the λ *REF* <sup>g</sup> as a space-efficient calculus and proved the gradual guarantee. Our <sup>λ</sup> *G gpr* is the first polymorphic gradually typed language with references.

*Existing gradual polymorphic calculi.* In the following we summarize some of the solutions to the problem of preserving parametricity and gradual guarantee in gradual polymorphic calculi and the changes that these solutions entail.

*Dynamic sealing.* Ahmed et al. [3] solved the problem in Section 2 by using dynamic sealing, inspired by the work of Matthews *et al.* [21]. They proposed the polymorphic blame calculus [3] (we present it as λ*<sup>B</sup>* ∀ ), which is a widely used cast calculus with dynamic sealing. The most interesting construct of λ*<sup>B</sup>* ∀ is the named type binding ν*<sup>X</sup>* :<sup>=</sup> *<sup>A</sup>*.*t*, which is introduced to record the instantiated type of a type variable. The programs in Section <sup>2</sup> behave as expected in λ*<sup>B</sup>* ∀ :

$$\begin{aligned} (K^\star : \star \Rightarrow \forall X. \forall Y. X \rightarrow Y \rightarrow X) \text{ lnt } \mathsf{Int}\ 2 \ 3 \\ \longleftrightarrow^\* \nu Y := \mathsf{Int}. \nu X := \mathsf{Int}. (2 : X \Rightarrow \star : \star \Rightarrow X) \\ \longleftrightarrow^\* 2 \\ (K^\star : \star \Rightarrow \forall X. \forall Y. X \rightarrow Y \rightarrow Y) \text{ lnt } \mathsf{Int}\ 2 \ 3 \\ \longleftrightarrow^\* \nu Y := \mathsf{Int}. \nu X := \mathsf{Int}. (2 : X \Rightarrow \star : \star \Rightarrow Y) \\ \longleftrightarrow^\* \lambda \lambda \mathit{lme} \end{aligned}$$

The first program succeeds and returns the first argument. While the second program fails, since the polymorphic information is recorded as *X* := Int and *Y* := Int in type bindings and the original type variable names are preserved in the casts. Notably, for higher-order values, λ*<sup>B</sup>* ∀ follows the lazy semantics as the blame calculus [39,29]. That is, for a function value, the checking is delayed until an argument value is applied. This, unfortunately results in unbounded space consumption for higher-order casts [13,14].

As Xie *et al.* [41] pointed out, the compatibility relation of λ*<sup>B</sup>* <sup>∀</sup> mixes explicit and implicit polymorphism to some extent, since they employ the following rule:

$$\frac{A[X \mapsto \star] < B}{\forall X. A < B}$$

This compatibility rule of λ*<sup>B</sup>* ∀ allows <sup>∀</sup>*X*. *<sup>X</sup>* <sup>→</sup> *<sup>X</sup>* to be compatible with any static instantiated types such as Int → Int and Bool → Bool. These types are not related in System F so λ*<sup>B</sup>* ∀ is not a conservative extension of System F. The gradual guarantee has not been discussed in λ*<sup>B</sup>* ∀ , but they show the parametricity property.

*The F<sup>G</sup> and F<sup>C</sup> calculi.* Igarashi *et al.* [17] improved on <sup>λ</sup>*<sup>B</sup>* ∀ . They designed a source calculus (*FG*) and a target calculus (*FC*), which is a conservative extension of System F. The dynamic semantics of *F<sup>G</sup>* is indirect and defined by translation to *FC*. *F<sup>G</sup>* does not relate <sup>∀</sup>*X*. *<sup>X</sup>* <sup>→</sup> *<sup>X</sup>* with static instantiations, but only with the dynamic instantiation ? <sup>→</sup> ?. The type ? <sup>→</sup> ? is called quasi-polymorphic, since it is an instantiation of <sup>∀</sup>*X*. *<sup>X</sup>* <sup>→</sup> *<sup>X</sup>* similarly to what happens with implicit polymorphism. However, a type such as Int <sup>→</sup> Int is not quasi-polymorphic. Instead of binding types locally by (ν*<sup>X</sup>* :<sup>=</sup> *<sup>A</sup>*.*t*), they made the type bindings global. Their reduction form Σ . *<sup>f</sup>* ,<sup>→</sup> Σ 0 . *f* 0 is augmented with a store, which records the bounded type variables *X* := *A*. The above example reduces in *F<sup>C</sup>* as follows.

$$\begin{aligned} \Sigma \models (K^\star : \star \Rightarrow \forall X. \forall Y. X \rightarrow Y \rightarrow X) \text{ lnt } \mathsf{lnt } \mathsf{lnt } 2 \ 3 \\ \longleftrightarrow ^\*\Sigma \models (\lambda X. \lambda Y K^\star : \star \Rightarrow X \rightarrow Y \rightarrow X) \text{ lnt } \mathsf{lnt } \mathsf{lnt } 2 \ 3 \\ \longleftrightarrow ^\*\Sigma, X := \mathsf{lnt}, Y := \mathsf{lnt} \models (K^\star : \star \Rightarrow X \rightarrow Y \rightarrow X) \text{ lnt } 2 \ 3 \\ \longleftrightarrow ^\*\Sigma \end{aligned}$$

Furthermore, they argue that type bindings generated locally lead to run-time overheads. Their observation is that type bindings are not required for every substitution, but only for casts with the dynamic type (?). Therefore they employ two kinds of type variables, which are distinguished by labels. One kind is static type variables (X::S) and the other kind is gradual type variables (X::G). Type application for static type abstraction does not generate type bindings, which are only generated for gradual type abstractions. Parametricity and the static gradual guarantee are proved, although the proofs are not mechanized. However, the dynamic gradual guarantee is left as conjecture. In addition their static gradual guarantee is proved with some constraints in the type precision relation. In their precision, <sup>∀</sup>*X*. *<sup>X</sup>* <sup>→</sup> *<sup>X</sup>* is more precise than <sup>∀</sup>*X*. *<sup>X</sup>* <sup>→</sup> ? but not <sup>∀</sup>*X*. ? <sup>→</sup> *<sup>X</sup>*.

*The GSF calculus.* Toro *et al.* [37] presented the gradual polymorphic calculus (named GSF), which employs the Abstracting Gradual Typing (AGT) methodology. In AGT, casting of higher-order values is eager compared to λ*<sup>B</sup>* ∀ and *FC*. This avoids the problem of space consumption although, as New *et al.* [25] pointed out, the η principle (which ensures *<sup>V</sup>* <sup>≡</sup> λ*x*.*V x* in the call-by-value languages) is broken. To preserve parametricity, global dynamic sealing, which does not distinguish between static and gradual variables, is used. They also refine the presentation of evidence, which witnesses the consistency judgement, ensuring that it holds. Instead of simple evidences such as (hα, *Int*i), they employ sealing evidences (hα *E* , *Int*i). GSF satisfies parametricity but not the gradual guarantee. Importantly, they proved that the gradual guarantee is incompatible with parametricity.

*Parametricity with the Gradual Guarantee.* To achieve both parametricity and the gradual guarantee, New *et al.* [24] designed *PolyG<sup>v</sup>* calculus which gave up the syntax of System F and the users are required to provide different sealing options. They introduced the sealed syntax as *seal<sup>X</sup> M* which explicitly seals terms. With the user-defined syntax, the gradual guarantee and parametricity are proved. More recently, Labrada *et al.* [20] improve on GSF. They do not change the syntax of System F but insert plausible sealing forms during the elaboration from a gradual source language which is named Funk to a target cast calculus. They proved the gradual guarantee and parametricity for the target language, but for the source language (Funk), the gradual guarantee comes with a restriction for type applications, which can only be instantiated with base and variable types. Some of the main theorems are proved in Agda.

*Summary.* In order to keep parametricity we need several compromises. For instance, we need to use a dynamic sealing mechanism instead of direct type substitution causing extra space and time consumption. In many of the earlier calculi, the gradual guarantee is not obtained. In the later calculi, the gradual guarantee is either restricted or we need to give up the syntax of System F. Traditionally, many works on gradual typing are based on two different calculi: a source gradually typed language, and a target cast/ coercion calculus where casts/coercions are explicit. The dynamic semantics is defined by elaborating the source language to the target calculus. In other words, the semantics of the gradually typed language is given indirectly via a second, target language. All previously discussed works follow this indirect way to give the semantics to a gradually typed source language.

Furthermore, none of the gradually typed polymorphic calculi supports references. However, even for a static polymorphic calculus extended with mutable references ob-


Table 1: Comparison among gradual polymorphism calculi. A × denotes no. A X denotes yes while X denotes partial yes.

taining parametricity is highly non-trivial. As Ahmed *et al.* [2] stated: "*combing mutable references with polymorphism can be extremely tricky*." From the analysis of Jaber and Tzevelekos [18], we know that naively moving from a polymorphic calculus to incorporate with mutable references, breaks parametricity. The reason is that common references can be instantiated with differently typed variables. Therefore, extending a gradual polymorphic calculus with the mutable references is non-trivial, and none of the existing gradual languages with polymorphism support references.

Table 1 summarizes several features and differences in existing gradually polymorphic calculi.

# 7 Conclusion

In this paper, we design a static system <sup>λ</sup>*gpr* with polymorphism and references and its gradual counterpart λ *G gpr*. λ *G gpr* has a direct semantics without resorting to a cast calculi. In λ *G gpr*, the gradual guarantee is proved but we give up parametricity. In exchange, our calculus can be simplified, since sophisticated mechanisms such as dynamic sealing are not needed. Our calculus follows the original semantics of System F, based on direct type substitutions, avoiding extra space and time complexity that is necessary by mechanisms such as dynamic sealing. In the future, we could try to find out if there is a way to keep both gradual guarantee and relational parametricity for the source language, or explore more efficient formulations of λ *G gpr*.

Acknowledgements We are grateful to anonymous reviewers and our colleagues at the HKU PL group. This work has been sponsored by Hong Kong Research Grants Council projects number 17209520 and 17209821.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Modal Crash Types for Intermittent Computing<sup>⋆</sup>

Farzaneh Derakhshan(B) , Myra Dotzel, Milijana Surbatovich, and Limin Jia

> Carnegie Mellon University, Pittsburgh PA, USA {fderakhs,mdotzel,milijans,liminjia}@andrew.cmu.edu

Abstract. Intermittent computing is gaining traction in application domains such as Energy Harvesting Devices (EHDs) that experience arbitrary power failures during program execution. To make progress, programs require system support to checkpoint state and re-execute after power failure by restoring the last saved state. This re-execution should be correct, i.e., simulated by a continuously-powered execution. We study the logical underpinning of intermittent computing and model checkpoint, crash, restore, and re-execution operations as computation on Crash types. We draw inspiration from adjoint logic and defne Crash types by introducing two adjoint modality operators to model persistent and transient memory values of partial (re-)executions and the transitions between them caused by checkpoints and restoration. We defne a Crash type system for a core calculus. We prove the correctness of intermittent systems by defning a novel logical relation for Crash types.

Keywords: intermittent computing · modal Crash type · logical relation

# 1 Introduction

Intermittent computing is gaining importance in application domains that require inaccessible or large-scale device deployments, such as wildlife monitoring [28], tiny satellites [22,29], or smart civil infrastructure [1]. As battery maintenance may be infeasible in these environments, programs can instead run on batteryless Energy Harvesting Devices (EHDs). An EHD can run solely of energy harvested from its environment, at the cost of being powered intermittently. The device harvests energy (e.g., via solar panel) into a re-chargeable bufer. Once the energy bufer is full, the device turns on and begin to compute, consuming the stored energy. When the bufer drains, the device turns of at an arbitrary location until it can recharge and repeat this operational cycle. A power failure erases volatile execution state (e.g., the program counter), while

<sup>⋆</sup> This work was generously funded in part through National Science Foundation (NSF) Award 2007998, NSF Graduate Research Fellowship Program grants DGE1745016 and DGE2140739, and the CMU CyLab Security & Privacy Institute. Any opinions, fndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily refect the views of the sponsoring organizations.

T. Wies (Ed.): ESOP 2023, LNCS 13990, pp. https://doi.org/10.1007/978-3-031-30044-8 7 168–196, 2023.

nonvolatile state persists. For programs to make progress, they require intermittent system support to save state at checkpoints and restore the saved state after power failure, potentially causing re-execution from the last checkpoint.

As EHDs aim to enable long-term deployments with little or no maintenance, intermittent systems must execute programs reliably despite frequent power failures and partial executions. Initial systems [35,43,24] relied only on informal notions of correctness that left them susceptible to memory consistency bugs caused by reading the results of partial executions [23] or by allowing sensor reads from past executions to remain in the nonvolatile memory [39]. More recent work [41,40,9,13] provides formal frameworks and correctness criteria for reasoning about intermittent execution. More concretely, all intermittent executions of a program must be simulated by some continuously-powered execution [41]. In other words, intermittent execution should be idempotent. Even if the system induces multiple partial executions of a program due to power failure, the program should not generate a diferent result than it would on a single execution.

The correctness of an intermittent execution relies on checkpointing, restoring, and fnalizing state upon reaching the next checkpoint; mistakes in these operations can lead to incorrect, non-idempotent behavior. Few works have tried to understand the fundamental logical underpinning of these operations. This work flls this gap by formalizing checkpointing, crash, restoration, and re-execution as computation on Crash types. Crash types capture the core notion of intermittent computing: some values and computations persist across power failures and others do not. For instance, nonvolatile memory state persists across power failure and reboots, while volatile memory does not. Conversely, partially computed results do (or rather should) not persist across power failures, while completed/checkpointed computations do. We call the former unstable values and computations and the latter stable values and computations. Our key insight is that the interactions between these stable and unstable components bear close resemblance to shifts in adjoint logic [8,36]. Computation of a stable value can only rely on locations that store stable values, while computation on unstable values can rely on both stable and unstable values. Moreover, checkpoint and restore operations can turn values of one type to the other. We defne terms and their associated types so that each of the key intermittent computing operations must be well-typed under our Crash types.

We defne a core calculus for intermittent computing and develop a type system for Crash types by using the two adjoint modality operators. The Crash type of an intermittent computation is: <sup>C</sup>unit <sup>=</sup> <sup>↓</sup>(nat ⇝ <sup>↑</sup> <sup>C</sup>unit)∨ ↓↑unit, which says that the computation will either encounter a power failure (the left disjunct), or succeed in producing a stable value (the right disjunct). In the former case, the computation is suspended until energy arrives, after which it will again act as an intermittent computation. This recursive defnition captures the multiple re-executions of a computation under repeated power failures. To prove the correctness of intermittent systems, we defne a novel logical relation for Crash types, indexed by the number of power failures, which relates a continuouslypowered execution to an intermittent execution. While intermittent computing motivates our results, the methods we develop are generally applicable to other system failures with the same efect on persistent and transient storage.

This paper makes the following technical contributions:


Detailed proofs and defnitions can be found in the extended TR [15].

# 2 Background

We provide background on intermittent computing and detail how checkpoint systems work to store and restore program state to handle power failures.

Intermittent Computing on EHDs. EHDs need intermittent system support to save necessary state before power failure and to restore it after reboot. When and where such checkpoints occur governs the intermittent execution model under which software executes. The two prevailing intermittent execution models are just-in-time (JIT) checkpoints [5,4] and atomic execution [23,24,43,37]. Under a JIT model, state is saved immediately prior to power failure so that execution resumes from the same point after reboot. Under an atomic execution model, state is saved at the beginning of an atomic region. If power fails before the end of the region, the system will reboot to the beginning of the region, re-executing until the region completes without power failure (akin to software transactions [38]). State-of-the-art intermittent systems use a hybrid "JIT + Atomics" model that defaults to JIT checkpoints except when there is an explicit atomic region [40,25,19]. Our core calculus follows this hybrid model.

To ensure idempotence, an intermittent system must save the value of volatile state and often a portion of the nonvolatile state. To illustrate why, consider an execution of the simple program in Fig. 1. The program has four variables stored in nonvolatile memory: x, y, and z of type int and u of type bool. It consists of two code blocks: an atomic region declared with the Ckpt construct (lines 1-7 on the left of Fig. 1) and a regular code block executed in JIT mode (lines 8-14 on the right). A continuous execution of the atomic region with initial state x = 2, y = 0, z = 1, u = f ends in x = 2, y = 1, z = 1, u = tt. Now, suppose power fails after the execution of Line 2. Once the device recharges, the program restarts from the start of the atomic region. If the system does not restore y's original value, this re-run computes an incorrect result: x = 2, y = 2, z = 1, u = f. Thus, to ensure idempotent execution, an intermittent system must checkpoint, i.e., save the value of, both volatile and nonvolatile memory. We next explain correct execution of the program in Fig. 1 for atomic and JIT modes.

Atomic Region Execution. As EHDs are highly resource constrained, the system should save state judiciously; checkpointing all of nonvolatile memory is

Fig. 1. An example program with an atomic region and a JIT region

Fig. 2. Intermittent execution of an atomic region. We write i for int and b for bool.

expensive and unnecessary. For example, variables in an atomic region that are read-only (i.e., never updated) do not change value and need not be checkpointed. In our example, x and z are read-only, so checkpointing y and u is enough to ensure correct intermittent execution. Many intermittent systems follow this design of checkpointing all variables that are not read-only [37,19,17,26,44,12]. Given such a system, Fig. 2 shows an execution of the atomic region in Fig. 1. For now, ignore the last two columns about typing. To save and restore state, the system follows redo-log semantics. It records updates to checkpointed variables in a special volatile region, not main memory. This region clears if power fails, throwing out partial updates. Upon reaching the next atomic or JIT region, the system commits the updates by copying them back to main memory.

Row (0) shows initial nonvolatile locations, their values, and the mapping between variables and memory locations; locations ℓ1, ℓ2, ℓ3, and ℓ<sup>4</sup> in the nonvolatile memory correspond to variables x, y, z and u, respectively. When starting the atomic region (Row (1)), the system takes a snapshot of ℓ<sup>2</sup> and ℓ<sup>4</sup> and stores it in the volatile region V1. We mark the original nonvolatile locations as checkpointed with the superscript ck. i.e., ℓ ck <sup>2</sup> and ℓ ck 4 .Checkpointed locations ℓ ck <sup>2</sup> and ℓ ck 4 remain untouched for the remainder of the atomic region execution. Every access to variables y and u will instead be associated with their volatile copy ℓ<sup>2</sup> and ℓ4, e.g., the assignment in Line 2 is applied to the volatile logs of Row (2).

On power failure, all volatile memory clears (Row (3)), throwing out the log. The system shuts down until more energy is harvested, at which point the system regenerates the volatile copies ℓ<sup>2</sup> and ℓ<sup>4</sup> (Row (4)) and resumes execution from Line 2. When the execution of the atomic region is complete (Row (5)), the system commits the updated values of the checkpointed locations (ℓ<sup>2</sup> and ℓ4) from volatile memory to their original nonvolatile locations (Row (6)). During execution, local variables are stored to volatile memory via a let construct, e.g., location ℓ<sup>5</sup> for variable w on Line 3, corresponding to a volatile execution stack. On power failure, the device clears all volatile memory, but such stack allocated locations will be recreated upon re-execution.

JIT Region Execution. The JIT execution model prevents re-execution, so the intermittent system only saves and restores volatile state at checkpoints. Fig. 3 shows the details of executing the code on the right of Fig. 1 in JIT mode. Row (0) shows the initial nonvolatile locations, their values, and the mapping from variables to locations. The system starts the JIT region by creating an empty context to be populated by volatile locations (Row (1)). The let construct in Line 8 allocates a fresh location ℓ<sup>5</sup> in V<sup>2</sup> and updates the mapping to associate variable w to ℓ5. On a power failure in JIT mode, the system creates a nonvolatile copy of the volatile location ℓ<sup>5</sup> just before it loses the location (Row (3)). It marks the nonvolatile copy with the superscript ck. When restoring the program, the system restores these copies to volatile memory and dismisses the nonvolatile backups (Row (4)). The program then continues with the if clause on lines 9-12, fnally dropping the volatile location ℓ5, as it is out of scope (Row (5)).

Fig. 3. Intermittent execution of a JIT region. We write i for int and b for bool.

# 3 Key Ideas of Crash Types

We present the intuition behind the stable and unstable memory types (Sec. 3.1), Crash types which internalize checkpointing, power failure/crash, restoration, reexecution, and fnalization of atomic regions (Sec. 3.2), and the independence principle applied to intermittent computing (Sec. 3.3).

#### 3.1 Modal Store Types

An unstable value is an intermediate result of an execution towards a stable value and will be lost upon a power failure. However, if the result of a partial execution is committed to a nonvolatile location, it will persist and is thus stable. To refect the behavior of a memory location in its type, we introduce two (adjoint) modalities ↑ s u (read as "up shift from unstable to stable") and ↓ s u (read as "down shift from stable to unstable"), where ↑ s u τ indicates that the location stores a stable value of type τ and ↓ s u τ indicates that the location stores an intermediate result of an execution toward a value of type τ . To fully capture how intermittent execution interacts with a memory location, we also annotate the type of a memory location with an access qualifer, RD or CK, that represents whether the location is read-only or checkpointed by the system, respectively.

In our example in Fig. 2, the read-only variable x is stored in nonvolatile memory, so it has type x :↑ s <sup>u</sup> int@RD. The checkpointed variable y has type y ck :<sup>↑</sup> s <sup>u</sup> int@CK in the nonvolatile memory, while y's volatile copy has type y :↓ s <sup>u</sup>↑ s <sup>u</sup> int@CK. We use the context Ω to type nonvolatile memory and the context Σ to type volatile memory, as shown in the third columns of Figs. 2 and 3. We drop the superscript s and subscript u from the modalities for brevity.

#### 3.2 Crash Types

To capture the efects of intermittent execution in the type of expressions and commands, we introduce Crash types, as the notion of stable and unstable values is insufcient. One might expect the expression x − y to have the type ↓↑int as it is a (partial) execution towards computing a stable integer value. However, this type does not account for steps due to power failure: the crash itself, waiting for the device to charge, restoration, and re-execution. To refect these runtime system steps at the type level, we assign the expression a type in the form of a disjunction ? ∨ ↓↑int, where ? is a type for computations that handle power failures. This type means that the expression either power fails, or completes its execution that evaluates to int. Next, we fll in ? for commands and expressions. ? is a recursive type since it handles re-execution.

Commands. The Crash type for commands is: <sup>C</sup>unit <sup>=</sup> <sup>↓</sup>(nat ⇝ <sup>↑</sup> <sup>C</sup>unit) <sup>∨</sup> ↓↑unit. The right disjunct states that if no power failure occurs while executing a command, then it computes a stable value of type unit. The left disjunct states that on power failure, the computation continues as a function; after receiving a (logical) energy input from the environment, it becomes a computation that yields a stable value of a command type, i.e., Cunit. This computation will execute after the restore, which difers for atomic and JIT modes. In an atomic region, the system re-executes the region from the beginning, and in a JIT region, the system continues with the same command that was interrupted by the failure.

Expressions. The defnition of the Crash type for expressions depends on the execution mode, just as the continuation of the program after a power failure depends on the mode. In an atomic region, the system restores an interrupted run of the expression to the original command enclosed in the region, so the type of an atomic mode expression is C atom <sup>A</sup> <sup>=</sup> <sup>↓</sup>(nat ⇝ <sup>↑</sup> <sup>C</sup>unit) ∨ ↓↑A, where the left disjunct is the same as that of a command. On the other hand, an interrupted run of an expression in JIT mode will be restored to the expression itself. Hence, the type of a JIT mode expression is C jit <sup>A</sup> <sup>=</sup> <sup>↓</sup>(nat ⇝ <sup>↑</sup><sup>C</sup> jit <sup>A</sup>) ∨ ↓↑A, where the left disjunct states that after power failure and reception of the energy input, the computation again yields a stable value of a JIT mode expression type.

#### 3.3 Independence Principle for Typing Intermittent Execution

We design our typing rules to follow the rules for ↓ and ↑ modalities in adjoint logic. We introduce two judgment categories. The frst category (Js) is for deriving stable types and corresponds to the judgments of the form Ω ⊢ τ s , meaning that the rules can rely only on stable locations to evaluate computation on a stable type. The second category (Ju) is for deriving unstable types and corresponds to the judgments of form Ω; Σ ⊢ τ u , meaning that the rules can rely on both stable and unstable locations to evaluate computation on an unstable type.

The adjoint modalities allow going back and forth between judgments J<sup>s</sup> and Ju, mirroring checkpointing and restoration operations. The following four sequent calculus rules in the underlying logic govern this back-and-forth behavior in our system. The rules are derivable from the more general rules in prior work [8,34,36]—in particular, the ↑L ∗ rule can be derived from a cut rule and ↓L. Typical of sequent calculus style rules, we read them bottom-up and match each execution step of a command with the reading of a corresponding rule. Next, we illustrate this matching using the execution steps in Figs. 2 and 3.

$$\frac{\Omega; \vdash \tau^{u}}{\Omega \vdash \uparrow \tau^{u}} \; \uparrow R \quad \frac{\Omega, \uparrow A^{u}; \Sigma, \downarrow \uparrow A^{u} \vdash \tau^{u}}{\Omega, \uparrow A^{u}; \Sigma \vdash \tau^{u}} \; \uparrow L^{\*} \qquad \frac{\Omega \vdash \tau^{s}}{\Omega; \Sigma \vdash \downarrow \tau^{s}} \; \downarrow R \quad \frac{\Omega, \uparrow A^{u}; \Sigma \vdash \tau^{u}}{\Omega; \Sigma, \downarrow \uparrow A^{u} \vdash \tau^{u}} \; \downarrow L$$

Shifts in Atomic Mode (Fig. 2): A combination of ↑R and two ↑L ∗ rules corresponds to creating a volatile log from the nonvolatile locations when starting the atomic region, i.e., the step from Row (0) to Row (1). The last two columns in Row (0) correspond to the conclusion of a ↑R rule: Ω<sup>0</sup> ⊢ ↑ Cunit. An application of ↑R from bottom to top drops the ↑ modality from the type of the program and opens an empty volatile region, i.e., Ω0; · ⊢ Cunit. Next, one application of ↑L ∗ , copies the variable y of type ↑ int to the volatile memory with the type ↓ ↑ int. Similarly, the next application of ↑L ∗ copies the variable u of type ↑ bool to the volatile memory with the type ↓ ↑ bool. The same combination corresponds to creating a volatile log from a nonvolatile location when restarting the atomic region, i.e., the step from Row (3) to Row (4), again copying variables y and u to the volatile memory.

The ↓R rule corresponds to a power failure, which erases the volatile memory Σ. From Row (2) to Row (3) in Fig. 2, the system loses the volatile locations of y and u and closes of the volatile context. Row (2) corresponds to the conclusion of the rule, and Row (3) corresponds to its premise. The type of the command in Row (2) changes from <sup>C</sup>unit to <sup>↓</sup>(nat ⇝ <sup>↑</sup>Cunit) (by another <sup>∨</sup>-R rule as a crash is detected), and then to the type (nat ⇝ <sup>↑</sup>Cunit) in Row (3).

Finally, a ↓L rule combined with a standard weakening rule and a ↓R rule corresponds to the fnal commit of the volatile context, i.e., stepping from Row (5) to Row (6), the nonvolatile context drops the locations y and u of types ↑int and ↑bool, respectively, by a weakening rule. These two variables map to the locations with outdated values. Next, the volatile locations of y and u in Σ′ , which contain the up-to-date values, commit their values to the nonvolatile context by a ↓L rule. Then, a ↓R rule closes of the remaining volatile context, which contains w of type ↓ ↑int. The type of the command in Row (2) changes from Cunit to ↓↑unit (by a separate ∨-R rule as the system detects a successful execution) and from that to type ↑int in Row (6).

Shifts in JIT Mode (Fig. 3): A ↑R rule corresponds to creating an empty volatile context Σ<sup>1</sup> when starting the JIT region, i.e., the step from Row (0) to Row (1). A combination of the ↓L rule and ↓R rule corresponds to a power failure, i.e., the stepping from Row (2) to Row (3). A ↓L rule copies the location w of type ↓ ↑ bool from volatile memory Σ<sup>2</sup> to nonvolatile memory Ωc. A ↓R rule closes of the (empty) nonvolatile memory. As in atomic mode, a combination of ↑R and ↑L ∗ rules corresponds to creating a volatile log from a nonvolatile location when restarting the command after the failure, i.e., the step from Row (3) to Row (4). The ↑R rule clears a portion of volatile memory, and the ↑L ∗ rule copies variable w from nonvolatile memory into volatile memory. We need an extra weakening rule to eliminate the remaining variable w in nonvolatile memory. The dropping of volatile memory at the end of execution (Row (5)) is not a modal step, but rather follows from a standard rule for the let clause.

# 4 A Basic Calculus for Intermittent Execution

We present the syntax, semantics, and the Crash type system for a basic calculus.

#### 4.1 Syntax

The syntactic constructs are summarized in Fig. 4. Expressions include constants, variables, and binary operations while commands include assignments, mutable let bindings, sequencing, and if branching. A program consists of sequenced blocks of commands and atomic regions, denoted Ckpt[aID, ρ](c) with a unique identifer aID, read-only variables ρ, and the enclosed command c.

Nonvolatile memory (NV) and volatile memory (V) map locations ℓ to values. Each location is annotated with its access mode q (RD or CK). The nonvolatile memory location ℓck is the checkpointed copy of location ℓ in volatile memory. The context γ maps variable names to memory locations. Access mode qualifers in V and NV have constrained values (to be discussed in the semantics).

#### Command, expression, and memory


#### Instructions, statements, and confgurations.

commands c ::= · · · c;<sup>W</sup> c crash instrs i ::= ↓ε # in(b > 0, ↑κ) continuations κ ::= c | e | ε # in(b > 0, ↑κ) |↑ κ statements s ::= κ | i | p open confg K<sup>o</sup> ::= (γ | Md | g | NV | V | s) energy level g ::= · | n | (γ | Md | g | NV | s) charge stream χ ::= n :: χ closed confg K<sup>c</sup> ::= [χ ▷ ε] ⊗ K<sup>o</sup> exec. mode Md ::= aID(c) | jit

Fig. 4. Summary of syntax

The runtime instruction c1;<sup>W</sup> c<sup>2</sup> is used for evaluating c<sup>1</sup> under the execution context W. To model energy harvesting from the environment, we assume a unique external energy channel, ε, from which the system receives energy. Three crash instructions control the system in the event of a power failure. The instruction ↓ε # in(b > 0, ↑κ) models the system that faces a power failure, where κ is the interrupted command or expression, and b > 0 is a guard to ensure that the bound incoming energy variable b is positive. The instruction ε # in(b > 0, ↑κ) models the system awaiting an energy input to be bound to b. The instruction ↑κ models the system ready to restore memory and re-execute.

We write K<sup>o</sup> to denote an open system confguration, consisting of the mapping γ, the mode of execution Md (i.e., atomic or JIT), energy available for this execution g, memories, and the statement s to be executed. The energy level (·) models the state right after power failure. We close an open confguration with [χ ▷ ε]; we connect it via an external energy channel ε to an infnite charging stream Ξ of natural numbers, which models available energy the confguration harvests from the environment at each power failure point for re-execution.

We call a confguration that cannot take a step a value confguration (value for short). An open confguration of form (· · · | g | · · · | s) is a value, i.e., Val(· · · | g | · · · | s), if either s is a constant or skip, it has depleted all energy for this execution (g = 0), or s is a crash instruction. The latter two cases are values because they cannot take a step without interacting with the environment or perform operations on the volatile and novolatile memory specifc to handling power failures. A closed confguration is a value only if the statement s is skip with some energy left (g > 0). We list all values in the extended TR [15].

#### 4.2 Operational Semantics

Top-level Program Execution. The top-level semantic rules for setting up and fnalizing the atomic and JIT execution contexts are shown in Fig. 5. The P-Ckpt rule applies if the next code block is an atomic region. The nonvolatile

$$\begin{array}{c} n > 0 \quad \mathsf{IntWords}\_{d}(\mathsf{NN}; \rho; \gamma) = \mathsf{NN}\_{0}, V\_{0} \\ [\chi \rhd \varepsilon] \otimes \gamma \mid \mathsf{all}\mathsf{D}(\mathsf{c}) \mid n \mid \mathsf{NN}\_{0} \, \vert \mathsf{V}\_{0} \, \vert \mathrm{co} \, \rangle \Rightarrow \, ^{\mathsf{T}} [\chi' \rhd \varepsilon] \otimes \gamma' \mid \mathsf{all}\mathsf{D}(\mathsf{c}) \mid n' \mid \mathsf{N}\mathsf{V}' \, \vert \mathsf{V}' \, \vert \mathsf{skip} \\ n' > 0 \quad \mathsf{N}\mathsf{V}\_{1} = \mathsf{FinWords}(\mathsf{dN}'; \mathsf{V}') \\ \hline [\chi \rhd \varepsilon] \otimes \gamma \mid n \mid \mathsf{N}\mathsf{V} \, \vert \mathsf{Ckpt}[(\mathsf{a}\mathsf{D}; \rho)](\mathsf{c}); p \Rightarrow \, [\chi' \rhd \varepsilon] \otimes \gamma \mid n' \mid \mathsf{N}\mathsf{V}\_{1} \mid p \\ \end{array} (\text{P-CKpr})$$
 
$$\begin{array}{c} n > 0 \quad n' > 0 \\ \hline [\chi \rhd \varepsilon] \otimes \gamma \mid \mathsf{jit} \mid n \mid \mathsf{N}\mathsf{V} \mid \cdot \mid c \Rightarrow \, ^{\mathsf{T}} [\chi' \rhd \varepsilon] \otimes \gamma' \mid \mathsf{jit} \mid n' \mid \mathsf{N}\mathsf{V}' \mid \mathsf{N}' \mid \mathsf{skip} \\ \hline [\chi \rhd \varepsilon] \otimes \gamma \mid n \mid \mathsf{N}\mathsf{V} \mid c; p \Rightarrow \, [\chi' \rhd \varepsilon] \otimes \gamma \mid n' \mid \mathsf{N}\mathsf{V}' \mid p \end{array} (\text{P-Sk})$$

Fig. 5. Closed confguration semantics for programs

NV<sup>0</sup> and volatile V<sup>0</sup> locations are initialized based on a given NV, declared readonly variables ρ, and their mapping γ to locations. The InitWorld<sup>d</sup> function (a) changes the qualifer of locations in NV that are declared as read-only in ρ from CK to RD, (b) creates V<sup>0</sup> by copying the rest of the locations of NV that still have qualifer CK, and (c) marks the original version of the locations ℓ in NV that still have qualifer CK as checkpointed (ℓck). This part corresponds to the step from Row (0) to Row (1) in Fig. 2. The closed confguration of c<sup>0</sup> is evaluated until completion, using the rules in Fig. 6. This execution may undergo several power failures and corresponds to the steps from Row (1) to Row (5) in Fig. 2. Finally, the FinWorld<sup>d</sup> function closes of atomic regions, fnalizing the volatile and nonvolatile locations. FinWorld<sup>d</sup> (a) copies the values of volatile locations in V ′ that have a checkpointed version into NV′ , (b) removes CK from the locations in NV′ , i.e., converts ℓck to ℓ, and (c) replaces the RD qualifer of the locations in NV′ with CK. This corresponds to the step from Row (5) to Row (6) in Fig. 2.

The P-seq rule applies when the next code block is a regular command c. The closed confguration of c with an empty initial set of volatile locations is fully evaluated. This corresponds to the steps from Row (0) to Row (1) and Row (1) to Row (5) in Fig. 3. Then the resulting volatile locations V ′ scoped in c are dropped, corresponding to the step from Row (5) to Row (6) in Fig. 3.

Command Execution (Closed Confg). We summarize rules for a closed confguration in the top part of Fig. 6. Rule D-step steps the closed command confguration when the corresponding open confguration steps. Next, we explain the trio of power failure, charge, and restore rules. When the energy for this execution is depleted (i.e., g = 0), the D-Crash rule applies, stepping the system to the crash instruction <sup>↓</sup><sup>ε</sup> # in(b > 0; <sup>↑</sup>κ). Next, D-S-Jit or D-S-aID rules apply and operate on volatile memory based on the execution mode Md. In JIT mode, D-S-Jit checkpoints and stores all volatile memory in nonvolatile locations. In atomic mode, D-S-aID drops all volatile memory locations. Then, D-charge applies and inputs a natural number n > 0 from the energy channel, replenishing the confguration's energy level for re-execution. Finally, the program is restored via D-restore-Jit and D-restore-aID which copy checkpointed locations into volatile memory. D-restore-Jit drops the checkpointed regions and steps

#### Closed Confguration Semantics for Commands and Crash Instructions

$$\begin{array}{c} \gamma \mid \mathsf{Md} \mid n \mid \mathsf{NV} \mid \mathsf{V} \mid c \to \gamma \mid \mathsf{Md} \mid n' \mid \mathsf{NV}' \mid \mathsf{V}' \mid c' \\\\ \hline \lceil \chi \rhd \varepsilon \rceil \otimes \gamma \mid \mathsf{Md} \mid n \mid \mathsf{NV} \mid \mathsf{V} \mid c \Rightarrow \; [\chi \rhd \varepsilon \,] \otimes \gamma \mid \mathsf{Md} \mid n' \mid \mathsf{NV}' \mid \mathsf{V}' \mid c' \\\\ \hline \lceil \chi \rhd \varepsilon \,\end{array} \mid \begin{array}{c} \varepsilon \mid \mathsf{Nd} \mid n \mid \mathsf{NV} \mid \mathsf{V} \mid c \Rightarrow \; [\chi \rhd \varepsilon \,] \otimes \gamma \mid \mathsf{Md} \mid n' \mid \mathsf{NV}' \mid \mathsf{V}' \mid c' \\\\ \hline \end{array} \mid \begin{array}{c} \text{(D-strP)} \\\\ \hline \end{array} \mid$$

$$\begin{array}{c} \mathsf{Md} = \mathsf{jit} \\\hline [\begin{subarray}{c} \up{ $\chi$ } \rhd \ \mathtt{s} \end{subarray} \begin{subarray}{c} \begin{subarray}{c} \up{ $\mathsf{Md}| \ $ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \ \mathtt{V} \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c} \up{ $\mathsf{NV}$ } \end{subarray} \begin{subarray}{c$$

$$\begin{array}{llll} \mathsf{Md} = \mathsf{allD}(c\_{0}) & \gamma' \subseteq \gamma & range(\gamma') = dom(\mathsf{NN})\\ \hline \hline [\chi \rhd \varepsilon] \otimes \gamma \mid \mathsf{Md} \mid \cdot \mid \mathsf{NN} \mid \mathsf{V} \mid \downarrow \varepsilon \ \mathsf{in}(b > 0; \uparrow \kappa) \\ \Rightarrow [\chi \rhd \varepsilon] \otimes \gamma' \mid \mathsf{Md} \mid \cdot \mid \mathsf{NN} \mid \varepsilon \ \mathsf{#} \operatorname{in}(b > 0; \uparrow \kappa) \end{array} (\begin{array}{ll} \mathsf{M} \text{-S-AID} \\ \hline \end{array} (\begin{array}{ll} \mathsf{M} \text{-S-AID} \end{array}) \text{ } (\begin{array}{ll} \mathsf{M} \text{-S-AID} \end{array}) \text{)}$$

[n :: χ ▷ ε] ⊗ γ | Md | · | NV | ε # in(b > 0; ↑κ) ⇒ [χ ▷ ε] ⊗ γ | Md | n | NV | ↑κ (D-charge)

$$\begin{array}{c} \mathsf{NN} = \mathsf{NN'}, \mathsf{NN'}\_{\mathsf{ck}} \\ \hline [\begin{subarray}{c} \up{ $\chi\rhd$ } \varepsilon \text{]} \otimes \gamma \text{ ]} \mathsf{jit} \, \mathsf{n} \, \mathsf{NN} \, \uparrow \, \kappa \Rightarrow [\begin{subarray}{c} \up{ $\chi\rhd$ } \mathbb{B} \ \end{subarray} \; \begin{subarray}{c} \up{ $\chi\rhd$ } \end{subarray} \; \begin{subarray}{c} \up{ $\chi\rhd$ } \end{subarray} \; \begin{subarray}{c} \up{ $\chi\rhd$ } \end{subarray} \end{array} \text{( $\text{D-RSTORE-Jir$ } $)} \\ \hline \\ \end{array} \\ \begin{subarray}{c} \mathsf{NN} = \mathsf{NN'}, \mathsf{NN'}\_{\mathsf{ck}} \end{subarray} \end{array} \begin{subarray}{c} \begin{subarray}{c} \up{$ \chi\color{2}{C} \mathsf{R} \texttt{E-RSTORE-Jir $}$ } \; \mathsf{J} \mathsf{n} \, \mathsf{k} \mathsf{T} \mathsf{0} \mathsf{N} \, \mathsf{N} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{k} \, \mathsf{V} \, \mathsf{k} \, \mathsf{$$

Selected expression and command semantics

$$\frac{\gamma = \gamma', [x \mapsto \ell] \quad \mathsf{V} = \ell \otimes q \hookrightarrow v, \mathsf{V}' \quad n = n' + 1}{\gamma \mid \mathsf{Md} \mid n \mid \mathsf{N} \mathsf{V} \mid \mathsf{V} \mid x \mapsto \gamma \mid \mathsf{Md} \mid n' \mid \mathsf{N} \mathsf{V} \mid \mathsf{V} \mid v} \text{ (\$\mathsf{D}\$-V\$-RAND)}$$

$$\begin{array}{c} \mathsf{V} = \mathsf{V}', \ell \& q \hookrightarrow v' \quad q \neq \mathsf{R} \\ \hline \gamma \mid \mathsf{M} \mid n \mid \mathsf{N} \mathsf{V} \mid e = \gamma', [x \rightarrow \ell] \quad n = n' + 1 \\ \hline \gamma \mid \mathsf{M} \mid n \mid \mathsf{N} \mathsf{V} \mid \mathsf{V} \mid x := e \rightarrow \gamma \mid \mathsf{M} \mid n' \mid \mathsf{N} \mathsf{V} \mid \mathsf{V}', \ell \otimes q \hookrightarrow e \mid \mathsf{skip} \end{array} (\mathsf{D-AssucN-V})$$

Fig. 6. Statement steps

to the interrupted command κ, while D-restore-aID keeps the checkpointed regions and steps to the original command c<sup>0</sup> in the atomic region.

Command/Expression Execution (Open Confg). The rules for executing commands and expressions in an open confguration are standard. We present a selection of them on the bottom of Fig. 6. Each step decrements the energy level by one. The rules ensure that checkpointed location ℓck in NV is not read by the program, as it could store outdated data, and is not written to, as this would tamper with the checkpointed value.

#### 4.3 Types, Typing Contexts, and Judgments

This section introduces the typing judgments used in our static typing.


Table 1. Typing judgment summary

Types and Static Context. Our types are summarized below. The two modalities stratify types into the varieties stable (τ s ) and unstable (τ u ). The base store types int and bool are considered unstable. A type variable v<sup>t</sup> denotes a type in the set {Cunit, C atom <sup>A</sup> , C jit <sup>A</sup>}, and implements the recursive nature of Crash types. We include the connectives <sup>∨</sup> and ⇝ solely for the purpose of defning Crash types; they are not used elsewhere. Defning Crash types using these connectives will allow us to defne the logical relation in Sec. 5 based on the intended meaning of its index type. Some well-formed types, e.g., nat ⇝ nat ⇝ <sup>↑</sup>unit, are not accepted by our type system introduced in Sec. 4.4. These types have no inhabitants, i.e., no well-typed confguration is of these types.

store types A := int | bool stable types τ s := nat ⇝ τ s |↑ τ u basic types T := unit | A unstable types τ u := T |↓ τ s | τ <sup>u</sup> <sup>∨</sup> <sup>τ</sup> u | v<sup>t</sup> Volatile store typing context Σ := · | x : ↓ s <sup>u</sup>↑ s <sup>u</sup>A@Ck, Σ Nonvolatile store typing context Ω := · | x : ↑ s <sup>u</sup>A@Rd, Ω | xck : ↑ s <sup>u</sup>A@CK, Ω | x : ↑ s <sup>u</sup>A@CK, Ω

A nonvolatile store typing context Ω assigns stable types to nonvolatile location variables, i.e. all variables in Ω have a type of the form ↑ s <sup>u</sup>A. A volatile store typing context Σ assigns unstable types to volatile location variables, i.e., variables in Σ are of the type ↓ s <sup>u</sup>↑ s <sup>u</sup>A. xck refers to a location that has been checkpointed. In the atomic mode, xck has an active volatile log in Σ.

Typing Judgments. Table 1 summarizes all the typing judgments. These judgments are parameterized over the execution mode Md of the expression or command to be typed. The judgment also tracks a variable b corresponding to the current energy level of this execution. b ranges over natural numbers (nat) and is constrained by a relation R ∈ {≥, >} or is set to 0; where b ≥ 0 is unconstrained. The constraint on b determines whether or not a command can evaluate a value without power failure. There are three judgments for command typing. The frst judgment is used when the command has not yet successfully fnished 180 F. Derakhshan et al.

$$\frac{\begin{array}{c} \text{jit} \mid b \ge 0: \texttt{nat} \mid \Omega; \cdot \vdash\_{\emptyset} c: \texttt{C\_{\texttt{untx}}} \quad b: \texttt{nat} \mid \Omega \vdash p: \uparrow \texttt{C\_{\texttt{untx}}}\\ \hline b: \texttt{nat} \mid \Omega \vdash c; p: \uparrow \texttt{C\_{\texttt{untx}}} \end{array} (\texttt{T-P-S:\texttt{e}}) }{(\texttt{T-P-S:\texttt{e}}) }$$

$$\begin{array}{c} \mathcal{Q}\_{0} \mid \Sigma\_{0} = \mathsf{Init}\mathsf{World}\_{t}(\mathcal{Q};\rho) \\ \mathsf{Sig} = \{\mathsf{all}(c\_{0}) \mid b \ge 0: \mathsf{nat} \mid \mathcal{Q}\_{0}; \Sigma\_{0} \vdash c\_{0}: \mathsf{C\_{\mathsf{unit}}}\} \\ \mathsf{all}\mathsf{D}(c\_{0}) \mid b \ge 0: \mathsf{nat} \mid \mathcal{Q}\_{0}; \,\Sigma\_{0} \vdash\_{\mathsf{Sig}} c\_{0}: \mathsf{C\_{\mathsf{unit}}} \quad b : \mathsf{nat} \mid \mathcal{Q} \vdash p: \uparrow\mathsf{C\_{\mathsf{unit}}} \\ \hline \hline b: \mathsf{nat} \mid \mathcal{Q} \vdash \mathsf{C}\mathsf{kpt}[\mathsf{alD},\rho](c\_{0}); p: \uparrow\mathsf{C\_{\mathsf{unit}}} \quad (\mathsf{T}\mathsf{.P}\mathsf{c\_{\mathsf{k}}}\mathsf{pr}) \end{array}$$

Fig. 7. Program typing

executing; its next step, depending on its constraint R, may or may not crash. When the command reaches type ↓↑unit, b no longer needs to be constrained as the execution succeeded without power failure. The second judgment invokes the third judgment to type the confguration after the volatile log is committed: in the typing rule for committing the volatile log, the conclusion is of the form of the second judgment and the premise is of the form of the third. For expression typing, we distinguish expressions on the right of an assignment (being read) from those on the left of an assignment (being written to) via subscripts RD and WT, respectively. The expressions that are being written to are only of the simple form x. As no execution is required to evaluate x, we consider its judgment crash free, so no constraint is required on b. For program typing, we only have one judgment that refers to the type of the program before the execution of its next block starts. The rest of the judgments type states after a crash. The frst judgment uses the constraint b = 0, which corresponds to the power failure condition. It invokes the second judgment, which types a state right after crash. The third judgment types the state awaiting energy to continue re-execution, and the fnal judgment types the state that is ready for restoration and re-execution.

#### 4.4 Typing Rules

Program Typing. Fig. 7 shows the typing rules for programs. The P-seq rule types program c; p by frst typing c under jit mode, requiring b ≥ 0, and then typing the rest of the program. The volatile memory context is empty for now, but will be populated when the let commands allocate new volatile locations.

The P-Ckpt rule types the command c<sup>0</sup> enclosed in an atomic region under the mode aID(c0) and then types the rest of the program p. The frst premise sets up the initial typing contexts for nonvolatile and volatile memories, as illustrated in Fig. 2. The partial function InitWorld<sup>t</sup> initializes the volatile memory by creating a log of variables in Ω that are not read-only. Ω can be uniquely split into Ω<sup>c</sup> and Ω<sup>r</sup> , where Ω<sup>r</sup> is the set of all read-only locations in Ω, and Ω<sup>c</sup> is the set of all locations that are not read-only. This function is defned below:

<sup>Ω</sup><sup>0</sup> <sup>|</sup> <sup>Σ</sup><sup>0</sup> <sup>=</sup> InitWorldt(Ω; <sup>ρ</sup>) if <sup>ρ</sup> <sup>⊆</sup> dom(Ω), <sup>Ω</sup><sup>0</sup> <sup>=</sup> <sup>Ω</sup><sup>r</sup> , Ω<sup>c</sup> ck and <sup>Σ</sup><sup>0</sup> <sup>=</sup> <sup>↓</sup>Ω<sup>c</sup>

where Ω = Ω<sup>c</sup> , Ω<sup>r</sup> and Ω<sup>r</sup> = Ω↾ρ.

Here Ω<sup>r</sup> = Ω↾ρ is a subset of Ω where locations are declared in ρ to be read-only, and Ω<sup>c</sup> are all other locations in Ω. The context Ω<sup>c</sup> ck, is defned as Ω<sup>c</sup> ck <sup>=</sup> {xck : <sup>↑</sup>A@<sup>q</sup> <sup>|</sup> <sup>x</sup> : <sup>↑</sup>A@<sup>q</sup> <sup>∈</sup> <sup>Ω</sup>c}, and the context <sup>↓</sup>Ω<sup>c</sup> , is defned as <sup>↓</sup>Ω<sup>c</sup> <sup>=</sup> {<sup>x</sup> : ↓↑A@<sup>q</sup> <sup>|</sup> <sup>x</sup> : <sup>↑</sup>A@<sup>q</sup> <sup>∈</sup> <sup>Ω</sup>c}. If the set of read only variables, <sup>ρ</sup>, is not in the domain of Ω, then the function InitWorld<sup>t</sup> is not defned.

In rules P-seq and P-ckpt, the command typing judgment in the premise makes use of a signature (subscripts ∅ and Sig, respectively) to type check the command relative to the signature. The signature is populated at diferent stages of type checking the JIT and atomic regions. In an atomic region, rule T-P-Ckpt populates the signature at the beginning of the region with the initial judgment which includes the region's original command c<sup>0</sup> and static memory context Ω0; Σ0. The region is then typed relative to the signature. In JIT mode, the signature is populated later with the judgment just at the point of the failure (rule T-enough?). The program remembers that it built a typing derivation for the judgment in the signature such that when it restores from a power failure, it refers to the signature and checks that the restored judgment matches the one stored in the signature without needing to derive it again. This makes the typing derivations fnitary and inductive.

Command and Expression Typing. Fig. 8 shows selected typing rules for commands. The T-skip rule declares the command skip as the stable type <sup>↑</sup>unit. Rule T-∨-Succ applies when the command successfully completes its execution and still has one unit of energy available (b > 0) to conclude the execution. In this case, we close of the energy level variable and continue typing the command against the type ↓↑ unit. Rule T-C-shift is invoked by T-∨-Succ and updates the memory typing contexts by removing checkpointed locations in Ω as now they are not needed, and making locations in Σ stable as now they are committed. This corresponds to the last step of Fig. 2.

The rules T-let and T-assign, are mostly standard except that we consider crashes. For example, in typing the assign command x := e, the frst premise of T-assign considers the type of expression e to be the Crash type C Md <sup>A</sup> , but in the second premise we require the location x to be of type ↓↑A, i.e., the location only considers the type corresponding to the case where execution of e can be completed successfully. The reason is that the assignment only occurs if the execution of e is successful. The constraint on the energy levels for premises goes back to b ≥ 0, as we use one energy unit to deconstruct these commands.

The rule T-Enough? checks two premises based on the value of <sup>b</sup> <sup>≥</sup> 0. The third premise, a crash judgment, corresponds to the case where b = 0 (typing rules for crash judgments are given later in this section) and the fourth premise corresponds to the case where b > 0. The condition b > 0 states that there is at least one unit of energy available to decompose one command construct, e.g., via T-let or T-assign. This rule populates the signature for JIT commands. The second premise states that the signature remains intact if the mode is atomic, but is populated by Sig′ if the mode is JIT. In the JIT mode, after a power failure, the command c is restored to itself, and Sig′ remembers that the well-typedness of the command when the energy level is non-negative has been checked already.

Expression typing rules are very similar to those of the commands. Fig. 8 shows a few selected rules. The T-Loc-Write and T-Loc-Read rules match

#### Commands

$$\begin{array}{c} \mathsf{M} \mid b \mid \mathsf{nat} \; \mathsf{nat} \; \langle \; \mathsf{T} \rangle\_{\mathsf{Sig}} \; \mathsf{skip} \; \mathsf{int} \; \mathsf{T} \text{-Str} \; \mathsf{r} \\\\ \frac{\Sigma = \downarrow \Sigma' \quad \Omega = \langle \; \mathcal{Q} \rangle\_{\mathsf{Sig}}^{\prime} \; \mathsf{M} \mid b \mid \mathsf{nat} \; \langle \; \mathsf{M} \; \mathsf{r} \rangle \; \mathsf{s} \; \mathsf{l} \mathsf{s} \; \mathsf{skip} \; \mathsf{r} \; \mathsf{int} \; \mathsf{T} \text{-C} \mathsf{Simr} \; \mathsf{r} \\\\ \end{array} \\\\ \begin{array}{c} \mathsf{M} \mid b \mid \mathsf{nat} \; \langle \; \mathsf{D} ; \mathsf{\Box} \; \mathsf{r} \right\models\_{\mathsf{Sig}} \mathsf{skip} \; \mathsf{l} \; \mathsf{int} \; \mathsf{t} \; \mathsf{un} \mathsf{t} \\\\ \mathsf{M} \mid b > 0 \; \mathsf{nat} \; \langle \; \mathsf{D} ; \mathsf{\Box} \; \mathsf{r} \rangle\_{\mathsf{Sig}} \; \mathsf{skip} \; \mathsf{r} \; \mathsf{l} \; \mathsf{\Box} \mathsf{m} \\\\ \end{array} \\\\ \begin{array}{c} \mathsf{M} \mid b \mid \mathsf{b} \mid \mathsf{b} \; \mathsf{nat} \; \langle \; \mathsf{D} ; \mathsf{\Box} \; \mathsf{r} \; \mathsf{R}\_{\mathsf{Sig}} \; \mathsf{q} \; \mathsf{r} : \mathsf{l} \; \mathsf{C} \mathsf{A} \\\\ \mathsf{M} \mid b \; \mathsf{b} \; \mathsf{l} \; \mathsf{nat} \; \| \; \mathsf{D} ; \mathsf{\Box} \; \mathsf{$$

Expressions

$$\frac{\Omega,\,\Sigma'=x:\uparrow A\,\cap q,\,\Omega\_2'\quad q\neq\mathbf{RD}}{\mathbf{M}\,\mid\,b:\,\mathbf{nat}\mid\,\Omega,\,\Sigma'\vdash\_{Wt} x:\uparrow A}\,\,(\mathbf{T}\text{-Loc\text{-}WaR\text{TF}})$$

Ω = x : ↑A@q, Ω′ Md | b : nat | Ω ⊢RD x : ↑A (T-Loc-Read) Md | b : nat | Ω ⊢RD tt :↑ bool (T-Bool-t)

Fig. 8. Selected command and expression typing

the location variable x with an existing variable inside the context. T-Loc-Write performs an extra check to make sure that x is not a read-only variable.

Statement typing Fig. 9 presents the typing rules for crash instructions. The crash is detected by the depleted energy level <sup>b</sup> = 0 in the T-∨-crash rule. In the premise, the crash instruction ↓ε # in(b > 0, ↑κ ′ ) is typed. In JIT mode, the T-Jit-stop rule brings a checkpointed version of all the volatile variables in Σ inside Ω since they are checkpointed then. In atomic mode, T-aID-Stop rule simply drops the volatile locations in Σ. The T-charge rule inputs a new energy level from the energy channel ε, regardless of the mode. The frst premise shows that the energy channel is needed to provide a natural number greater than zero. Finally, the T-Jit-Restore and T-aID-Restore rules prepare and check rebooted system in JIT and atomic modes, respectively. In both modes, volatile memory is restored from the checkpointed locations in Ω. In the atomic mode, the checkpointed locations persist in Ω as we may need them for the

$$\begin{array}{c|c} \hline \mathsf{Mod} & \mathsf{I} & \Omega, \Sigma \vdash\_{\mathsf{Sig}} \mathsf{L} \mathsf{e} \ \mathsf{in} \ \mathsf{i} \ \mathsf{i} \ \mathsf{i} \ \mathsf{j} \ (b > 0, \mathsf{l} \mathsf{k} \ \mathsf{l} \ \mathsf{and} \ \mathsf{l} \ \mathsf{match} \ \mathsf{l} \ \mathsf{T}^{\mathsf{R}} \ \mathsf{T}) \ (\mathsf{T} \ \mathsf{V} \ \mathsf{C} \mathsf{R} \mathsf{A} \mathsf{s} \ \mathsf{l} \ \mathsf{l} \ \mathsf{array} \\ \hline \mathsf{M} \ \mathsf{l} \ \mathsf{b} = 0 \ \mathsf{nat} \ \mathsf{l} \ \mathsf{Q}, \Sigma \vdash\_{\mathsf{Sig}} \mathsf{k}' \ \mathsf{l} \ \mathsf{(nat} \ \mathsf{l} \ \mathsf{m} \ \mathsf{l} \ \mathsf{T}^{\mathsf{R}} \ \mathsf{T}) \ \mathsf{d} \ \mathsf{l} \ \mathsf{T} \\ \hline \hline \\ \Sigma = \mathsf{J} \mathsf{T} \ \mathsf{Z}' \ \mathsf{ji} \ \mathsf{i} \ \mathsf{l} \ \mathsf{Q}, \mathsf{T} \ \mathsf{l} \ \mathsf{z} \ \mathsf{l} \ \mathsf{i} \ \mathsf{j} \ (b > 0, \mathsf{l} \mathsf{k}' \ \mathsf{i} \ \mathsf{j} \ \mathsf{k} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{a} \ \mathsf{l} \ \mathsf{$$

Fig. 9. Crash, restore, and checkpoint typing

next power failure. Alternatively, in the JIT mode, checkpoints are dropped from Ω and execution continues with the expression or command κ, which was running right before the crash. In the atomic mode, execution continues with the original command c<sup>0</sup> enclosed in the atomic region. Instead of retyping the restored judgments, we check if there are already typing derivations by matching them up with the saved judgment in the signature.

# 5 Logical Relation for Intermittent Execution

We establish a logical relation to prove idempotency, which states that every intermittent execution of a program can be simulated by a continuous execution. The logical relation relates an intermittent execution with a continuous one and is indexed by Crash types. A continuous run is one with an infnite energy level, ∞. Crash types are recursive, yielding possible infnite atomic region re-executions. Thus, we use the maximum number of executions (also power failures) as a step index to stratify our logical relation to ensure its well-foundedness.

The logical relation (defned in Sec. 5.1) relies on PwOff, Restore, and Commit functions, referred to as power failure, restore, and commit policies, respectively. We establish specifc policies for atomic and JIT execution modes. We formalize semantic typing as every atomic and JIT region of the program being logically-related to themselves. We prove that the semantically well-typed programs are idempotent across power failures in Sec. 5.2. The defnitions match the memory operations in the dynamic rules that deal with crash, restore, and re-execution (D-S-aID/ D-S-Jit, D-R-aID/ D-R-Jit, and D-P-Ckpt/ Md | b ≥ 0 : nat | Ω | Σ ⊩ c<sup>1</sup> ≤ c<sup>2</sup> : Cunit if ∀n, m ≥ 0. ∀γ, NV, V.s.t. NV | V ⊩ γ :: Ω | Σ. (<sup>γ</sup> <sup>|</sup> Md <sup>|</sup> <sup>n</sup> <sup>|</sup> NV <sup>|</sup> <sup>V</sup> <sup>|</sup> <sup>c</sup>1, γ <sup>|</sup> Md | ∞ | NV <sup>|</sup> <sup>V</sup> <sup>|</sup> <sup>c</sup>2) ∈ EJCunit<sup>K</sup> m

#### Term Relation

<sup>E</sup>JCunit<sup>K</sup> <sup>m</sup>+1 = {(γ<sup>1</sup> | Md | n<sup>1</sup> | NV<sup>1</sup> | V<sup>1</sup> | c1, γ<sup>2</sup> | Md | ∞ | NV<sup>2</sup> | V<sup>2</sup> | c2) s.t. ∃.(γ ′ <sup>1</sup> | Md′ | n ′ <sup>1</sup> | NV′ <sup>1</sup> | V ′ <sup>1</sup> | c ′ <sup>1</sup>) s.t. γ<sup>1</sup> | Md | n<sup>1</sup> | NV<sup>1</sup> | V<sup>1</sup> | c<sup>1</sup> →<sup>∗</sup> irred γ ′ <sup>1</sup> | Md′ | n ′ <sup>1</sup> | NV′ <sup>1</sup> | V ′ <sup>1</sup> | c ′ <sup>1</sup> ∧ ∃.(γ ′ <sup>2</sup> | Md′ | ∞ | NV′ <sup>2</sup> | V ′ <sup>2</sup> | c ′ <sup>2</sup>) s.t. γ<sup>2</sup> | Md | ∞ | NV<sup>2</sup> | V<sup>2</sup> | c<sup>2</sup> →<sup>∗</sup> γ ′ <sup>2</sup> | Md′ | ∞ | NV′ <sup>2</sup> | V ′ <sup>2</sup> | c ′ <sup>2</sup> ∧ (γ ′ <sup>1</sup> | Md′ | n ′ <sup>1</sup> | NV′ <sup>1</sup> | V ′ <sup>1</sup> | c ′ 1, γ′ <sup>2</sup> | Md′ | ∞ | NV′ <sup>2</sup> | V ′ <sup>2</sup> | c ′ <sup>2</sup>) ∈ VJCunit<sup>K</sup> <sup>m</sup>+1}

<sup>E</sup>JCunit<sup>K</sup> <sup>0</sup> = {(γ<sup>1</sup> | Md | n<sup>1</sup> | NV<sup>1</sup> | V<sup>1</sup> | c1, γ<sup>2</sup> | Md | ∞ | NV<sup>2</sup> | V<sup>2</sup> | c<sup>2</sup> )}

#### Value Relation

<sup>V</sup>J↑unit<sup>K</sup> <sup>m</sup> = {(γ | Md | n<sup>1</sup> | NV<sup>1</sup> |skip, γ | Md |∞ | NV<sup>2</sup> |skip) s.t.NV<sup>1</sup> = NV2} <sup>V</sup>J↓↑unit<sup>K</sup> <sup>m</sup> = {(γ<sup>1</sup> | Md | n<sup>1</sup> | NV<sup>1</sup> | V<sup>1</sup> |skip, γ<sup>2</sup> | Md |∞ | NV<sup>2</sup> | V<sup>2</sup> |skip) s.t. Commit(γ<sup>i</sup> | Md | NV<sup>i</sup> | Vi) = γ ′ <sup>1</sup> | NV′ <sup>i</sup> ∧ (γ ′ <sup>1</sup> | Md | n<sup>1</sup> | NV′ <sup>1</sup> |skip, γ<sup>2</sup> | Md |∞ | NV′ <sup>2</sup> <sup>|</sup>skip) ∈ VJ↑unit<sup>K</sup> <sup>m</sup>} <sup>V</sup>J↑Cunit<sup>K</sup> <sup>m</sup> = {(γ<sup>1</sup> | Md | n | NV<sup>1</sup> | ↑κ, γ<sup>2</sup> | Md |∞ | NV<sup>2</sup> | V<sup>2</sup> | c2) s.t. restore(γ1, Md, NV1, κ) = NV<sup>0</sup> | V<sup>0</sup> | c<sup>0</sup> ∧ (γ<sup>1</sup> <sup>|</sup> Md <sup>|</sup> <sup>n</sup> <sup>|</sup> NV<sup>0</sup> <sup>|</sup> <sup>V</sup><sup>0</sup> <sup>|</sup> <sup>c</sup>0, γ<sup>2</sup> <sup>|</sup> Md |∞ | NV<sup>2</sup> <sup>|</sup> <sup>V</sup><sup>2</sup> <sup>|</sup> <sup>c</sup>2) ∈ EJCunit<sup>K</sup> <sup>m</sup>} <sup>V</sup>Jnat⇝↑Cunit<sup>K</sup> <sup>m</sup> = {(γ<sup>1</sup> | Md | · | NV<sup>1</sup> | ε # in(n > 0, ↑κ), γ<sup>2</sup> | Md |∞ | NV2|V2|c2) s.t. <sup>∀</sup>n>0.(γ<sup>1</sup> <sup>|</sup> Md <sup>|</sup> <sup>n</sup> <sup>|</sup>NV1| ↑κ, γ<sup>2</sup> <sup>|</sup> Md |∞ |NV2|V2<sup>|</sup> <sup>c</sup>2) ∈ VJ<sup>↑</sup> <sup>C</sup>unit<sup>K</sup> <sup>m</sup>} <sup>V</sup>J↓(nat⇝↑Cunit)<sup>K</sup> <sup>m</sup> = {(γ<sup>1</sup> | Md | · |NV1|V1| ↓ε # in(n > 0, ↑κ), γ<sup>2</sup> | Md |∞ |NV2|V2| c2) s.t. PwOff(γ1, Md, NV1, V1) = γ ′ <sup>1</sup> | V ′ ∧ (γ ′ <sup>1</sup> | Md | · | V ′ , NV<sup>1</sup> | ε # in(n > 0, ↑κ), γ<sup>2</sup> | Md |∞ | NV<sup>2</sup> | V<sup>2</sup> | c2) ∈ VJnat ⇝ <sup>↑</sup>Cunit<sup>K</sup> <sup>m</sup>} <sup>V</sup>JCunit<sup>K</sup> <sup>m</sup>+1 = {(γ<sup>1</sup> | Md | n<sup>1</sup> | NV<sup>1</sup> | V<sup>1</sup> | c1, γ<sup>2</sup> | Md |∞ | NV<sup>2</sup> | V<sup>2</sup> | c2) s.t. either n<sup>1</sup> = 0 ∧ (γ<sup>1</sup> | Md | · | NV<sup>1</sup> | V<sup>1</sup> | ↓ε # in(n<sup>1</sup> > 0, ↑c1), <sup>γ</sup><sup>2</sup> <sup>|</sup> Md |∞ | NV<sup>2</sup> <sup>|</sup> <sup>V</sup><sup>2</sup> <sup>|</sup> <sup>c</sup>2) ∈ VJ↓(nat ⇝<sup>↑</sup> <sup>C</sup>unit)<sup>K</sup> <sup>m</sup>, or n<sup>1</sup> > 0 ∧ (γ<sup>1</sup> | Md | n<sup>1</sup> | NV<sup>1</sup> | V<sup>1</sup> | c1, γ<sup>2</sup> | Md |∞ | NV<sup>2</sup> | V<sup>2</sup> | c2) ∈ VJ↓↑ unit<sup>K</sup> <sup>m</sup>}

#### Fig. 10. Logical relation

D-P-seq) for atomic and JIT regions, We prove that our syntactically well-typed programs are semantically well-typed. We generalize semantic typing rules, allowing custom power failure, restore, and commit policies (Sec. 5.3).

#### 5.1 Semantic Typing via a Logical Relation

The logical relation, written Md <sup>|</sup> <sup>b</sup> <sup>≥</sup> 0 : nat <sup>|</sup> <sup>Ω</sup> <sup>|</sup> <sup>Σ</sup> <sup>⊩</sup> <sup>c</sup><sup>1</sup> <sup>≤</sup> <sup>c</sup><sup>2</sup> : <sup>C</sup>unit, is defned in Fig. 10 by a lexicographic induction on the index m and the structure of the types. The judgment NV <sup>|</sup> <sup>V</sup> <sup>⊩</sup> <sup>γ</sup> :: <sup>Ω</sup> <sup>|</sup> <sup>Σ</sup> in the defnition states that <sup>γ</sup> maps the variables in Σ and Ω to locations in V and NV resp., such that their qualifers and types match. Similar to prior work [2,16,42], our defnition consists of a term relation <sup>E</sup>JCunit<sup>K</sup> <sup>m</sup> and a value relation <sup>V</sup>J<sup>τ</sup> <sup>K</sup> m.

Term Relation. A pair of open command confgurations of type Cunit are in the term relation of index m if any intermittent execution of the frst one after m power failures is indistinguishable from a continuous execution of the second one. In particular, for index m+1, the term relation relates two confgurations at type Cunit if the frst confguration eventually steps to a value (or "irreducible") confguration, i.e., it either evaluates to skip or its energy level depletes (n ′ <sup>1</sup> = 0), and the second confguration can take zero or more steps such that the pair continue to be in the value relation of <sup>V</sup>JCunit<sup>K</sup> <sup>m</sup>+1. When the index is m = 0, no execution is observed, so any two confgurations are in the term relation. Here, irred refers to γ ′ 1 <sup>|</sup> Md′ | n ′ 1 <sup>|</sup> NV′ 1 | V ′ 1 | c ′ <sup>1</sup> being an irreducible confguration, i.e. it cannot take any more steps. Since our semantics for commands is deterministic, for each confguration γ<sup>1</sup> | Md | n<sup>1</sup> | NV<sup>1</sup> | V<sup>1</sup> | c<sup>1</sup> there is exactly one such irreducible confguration.

Value Relation. The value relation is defned based on the intended meaning of the type, and relates two value confgurations that will have the same efect on the stores. The value relation relates two open command confgurations at type Cunit and index m + 1 if either (a) the frst confguration has faced a power failure, and the two confgurations continue to relate by <sup>V</sup>J↓(nat ⇝ <sup>↑</sup>Cunit)<sup>K</sup> m, or (b) the frst confguration executed successfully without any power failures, and the two confgurations are related by <sup>V</sup>J↓↑unit<sup>K</sup> <sup>m</sup>. This defnition matches the disjunctive nature of type Cunit, which is recursively defned in the signature as <sup>↓</sup>(nat ⇝ <sup>↑</sup>Cunit) ∨ ↓↑unit. Since we unfold the recursive defnition of <sup>C</sup>unit, we decrease the index from m+ 1 to m to ensure the relation's well-foundedness. Note that the value relation is neither defned nor called for Cunit at index 0.

The value relations in the third, fourth, and ffth rows of Fig. 10 are defned based on the type of the frst confguration; the second confgurations in these relations continue to be of type Cunit. Only in the relations defned in the frst and second rows of Fig. 10 do the types of both confgurations match the indexed type of the relation. Hence, the value relation has varying arity: in the frst and second rows of Fig. 10, the relation is binary while in the rest, the relation degenerates to unary, with the second confguration as its Kripke world [18].

The value relation at type <sup>↓</sup>(nat ⇝ <sup>↑</sup>Cunit) relates two confgurations if the frst one runs the crash instruction ↓ε # in(n > 0, ↑κ) and a power failure policy creates a checkpoint of volatile locations such that the confgurations continue to be in the value relation at type (nat ⇝ <sup>↑</sup>Cunit). The power failure function in an atomic mode is defned to checkpoint none of the volatile locations, i.e., PwOff(γ, aID(c0), NV1, V1) = γ ′ | ∅, where γ ′ is the largest restriction of γ with range(γ ′ ) = dom(NV1), and defned to checkpoint all volatile locations in JIT mode, i.e., PwOff(γ, jit, NV1, V1) = γ | V1.

The value relation at type (nat ⇝ <sup>↑</sup>Cunit) is defned similarly to a function type in a value relation and requires the confgurations to be related at type (↑Cunit) for every energy input level n provided to the frst confguration.

The value relation at type ↑Cunit requires the frst confguration to run the crash instruction ↑κ. The defned restore policy restores the nonvolatile memory NV0, volatile memory V0, and re-execution command c<sup>0</sup> such that the confgurations continue to be related in the term interpretation at type Cunit. In an atomic mode, the restore function is defned as restore(γ, aID(c), NV1, κ) = NV<sup>1</sup> <sup>|</sup> NV′′ <sup>|</sup> <sup>c</sup> where NV<sup>1</sup> <sup>=</sup> NV′ , NV′′ ck. In the JIT mode, the restore function is defned as restore(γ, jit(c), NV1, κ) = NV′ 1 <sup>|</sup> NV′′ <sup>|</sup> <sup>c</sup> where NV<sup>1</sup> <sup>=</sup> NV′ , NV′′ ck. We write NV<sup>1</sup> = NV′ , NV′′ ck to state that NV<sup>1</sup> can be uniquely partitioned into all locations (NV′′ ck) that are checkpointed, i.e., of the form ℓck, and regular locations (NV′ ) of the form ℓ. NV′′ is the non-checkpointed version of NV′′ ck which could be retrieved by removing the ck subscript from every location in NV′′ ck.

The value relation at type ↓↑unit requires both confgurations to run skip, and the defned commit policy creates nonvolatile memories for both runs such that they continue to be related at type ↑unit. In an atomic mode, the commit function is defned to replace the checkpointed locations in the nonvolatile memory with their volatile log, i.e., Commit(γ | aID(c0) | NV<sup>1</sup> | V1) = γ ′ <sup>|</sup> NV′ 1 | V ′′ , where NV<sup>1</sup> = NV′ 1 , NV′′ ck and V<sup>1</sup> = V ′ 1 , V ′′ and dom(V ′′) = dom(NV′′). Moreover, γ ′ <sup>⊆</sup> <sup>γ</sup>, with range(<sup>γ</sup> ′ ) = dom(NV1) ∪ dom(V ′′). In the JIT mode, the commit function simply drops all volatile memory, i.e., Commit(γ | jit | NV<sup>1</sup> | V1) = γ ′ | NV1, γ ′ <sup>⊆</sup> <sup>γ</sup>, with range(<sup>γ</sup> ′ ) = dom(NV1).

The value relation at type ↑unit requires the successful executions to store the same values in their memories, i.e., NV<sup>1</sup> = NV2.

Semantic Typing. A program is semantically well-typed if every JIT and atomic region of it is self-related under our logical relation.

$$\frac{\begin{array}{c} \text{jt} \mid b \ge 0: \text{nat} \mid \Omega; \cdot \Vdash c \le c: \text{C}\_{\texttt{unit}} \quad b: \text{nat} \mid \Omega \Vdash p: \uparrow \mathbf{C}\_{\texttt{unit}} \\ \hline b: \text{nat} \mid \Omega \Vdash c; p: \uparrow \mathbf{C}\_{\texttt{unit}} \end{array} \text{( $\!P$ -SEQ-SEMANTC)}}{\begin{array}{c} \text{( $\!P$ -SEQ-SEMANTC)} \\ \hline \end{array}$$

Ω<sup>0</sup> | Σ<sup>0</sup> = InitWorldt(Ω; ρ) aID(c0)| b ≥ 0 : nat | Ω0; Σ<sup>0</sup> ⊩ c<sup>0</sup> ≤ c<sup>0</sup> : Cunit b : nat | Ω ⊩ p : ↑Cunit b : nat | Ω ⊩ Ckpt[aID, ρ](c0); p : ↑Cunit (P-Ckpt-semantic)

#### 5.2 Semantic Typing for Idempotency

The fundamental theorem of our logical relation states that syntactically welltyped programs are also semantically well-typed by proving that syntactically well-typed JIT and atomic regions are self-related. We state and prove the theorem in Sec. 6 but devote this section to explaining why being self-related implies idempotency. We explain it separately for JIT and atomic blocks.

Stepping a JIT block. Consider a program of form [χ1▷ε]⊗γ<sup>1</sup> <sup>|</sup> <sup>n</sup> <sup>|</sup> NV<sup>1</sup> <sup>|</sup> <sup>c</sup>1; <sup>p</sup> that can take a step to [χ<sup>k</sup> <sup>▷</sup> <sup>ε</sup>] <sup>⊗</sup> <sup>γ</sup> <sup>|</sup> <sup>n</sup> ′ k <sup>|</sup> NV′ k <sup>|</sup> <sup>p</sup> via the D-P-Seq rule. By the D-P-Seq rule, we know that the command c<sup>1</sup> is successfully executed to completion with possibly <sup>m</sup>-many power failures along the way: [χ<sup>1</sup> <sup>▷</sup> <sup>ε</sup>] <sup>⊗</sup> <sup>γ</sup><sup>1</sup> <sup>|</sup> jit <sup>|</sup> <sup>n</sup> <sup>|</sup> NV<sup>1</sup> | · | <sup>c</sup><sup>1</sup> <sup>⇒</sup><sup>∗</sup> [χ<sup>k</sup> <sup>▷</sup> <sup>ε</sup>] <sup>⊗</sup> <sup>γ</sup> ′ k | jit | n ′ k <sup>|</sup> NV′ k | V ′ k | skip. Our goal is to simulate this execution in a continuous setting. To model a continuous run, we run the confguration with <sup>∞</sup>, an energy level: [<sup>χ</sup> <sup>▷</sup> <sup>ε</sup>] <sup>⊗</sup> <sup>γ</sup><sup>1</sup> <sup>|</sup> jit | ∞ | NV<sup>1</sup> | · | <sup>c</sup><sup>1</sup> <sup>⇒</sup><sup>∗</sup> [<sup>χ</sup> <sup>▷</sup> <sup>ε</sup>] <sup>⊗</sup> <sup>γ</sup> ′ j <sup>|</sup> jit | ∞ | NV′ j | V ′ j | skip.

Fig. 11 shows the construction of the simulation. We start with the assumption that the confguration with n energy level is self-related when given energy level ∞ for every index, including m + 1 (point (1) in Fig. 11). We show that if the frst confguration takes one or more steps, the second confguration can take zero or more steps so that the intermediate regions continue to relate.

By defnition of the term interpretation, c<sup>1</sup> in the frst confguration is executed until the frst power failure occurs. Moreover, by the relation, we can execute c<sup>1</sup> in the second confguration, too, such that the resulting confgurations remain related (point (2) in Fig. 11) by the value interpretation at type Cunit. The frst confguration takes a step from point (2) to point (3) using the D-crash rule by the computational semantics. By the defnition of the logical relation, the two confgurations continue to be related by the value interpretation at type <sup>↓</sup>(nat ⇝<sup>↑</sup> <sup>C</sup>unit). Then the frst confguration takes a step from point (3) to point (4) by the D-S-Jit rule; in this case, we know (by the assumptions of the rule) V ′ = V ′ <sup>1</sup> and γ ′′ <sup>1</sup> = γ. This matches the defnition of the power-of policy for JIT blocks (see Sec. 5.1), and thus the two confgurations remain related by the value relation at type nat ⇝<sup>↑</sup> <sup>C</sup>unit. Next, the frst confguration takes a step to point (5) by inputting a new energy level from the environment (n2). By the defnition of the value relations, the two confgurations will remain related by the value interpretation at type ↑ Cunit.

Finally, the confguration steps to point (6) by D-Restore-Jit that copies all checkpointed locations inside the volatile memory and continues by running the interrupted command κ, i.e., here NV<sup>0</sup> = NV′ <sup>1</sup> and V<sup>0</sup> = V ′ = V ′ <sup>1</sup> and c<sup>0</sup> = κ. This matches the restore policy defned for JIT regions; thus, the confgurations continue to be related by the term relation at type Cunit, similar to what we had earlier at point (1) in Fig. 11, but with fewer power failures remaining.

Now, when the frst confguration fnally steps to point (8), by the defnition of the logical relation, we know that the second confguration steps into skip too. Thus, we can apply the D-Ckpt rule on the second confguration. The volatile memory V ′ j is dropped, and the mapping is reset to γ, i.e., it matches the commit policy defned for JIT blocks. in the logical relation. By Fig. 11-d, we get NV′ <sup>j</sup> = NV′ k , which completes deriving our goal.

Stepping an atomic region. We can build the desired simulation by taking the same steps described for a JIT region. Similarly, the key point is that the power-of and restore policies exactly match how the rules D-S-aID and D-restore-aID, respectively, handle nonvolatile and volatile memories, and the commit policy corresponds to the FinWorld function in the D-ckpt rule.

We showed that our logical relation ensures idempotency for JIT and atomic regions. In the next section, we show that our logical relation formalizes a semantic typing to ensure idempotency of more general policies.

$$\begin{array}{lcl} \mathsf{6} & \mathsf{10} & \mathsf{11} \left[ \begin{array}{lcl} \mathsf{\$\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!$$

Fig. 11. Why the logical relation is enough.

#### 5.3 More General Policies

We utilize our semantic typing to allow custom policies for power failure, restore, and commit. We extend the grammar of programs as p := · | Reg[aID, −→arg](c); p, where −→arg refers to the arguments that the programmer decides to pass to the region for initialization. To each region, we assign a unique identifer aID that is associated with the three policies and two functions InitGeneral<sup>t</sup> and InitGeneral<sup>d</sup> to initialize the static and dynamic memories, respectively. We add the following semantic typing rule for the general regions:

$$\frac{\mathtt{a}\mathtt{|}\mathtt{O}\_{0}\mid\mathcal{Q}\_{0}\mid\Sigma\_{0} = \mathtt{In}\mathtt{it}\mathtt{General}\_{t}(\mathcal{Q};\mathtt{a}\mathtt{l}\mathtt{D};c;\overline{\operatorname{or}}\overline{\mathcal{Y}}\mathtt{j})}{\mathtt{a}\mathtt{|}\mathtt{D}(c\_{0})\mid b\geq 0:\mathtt{nat}\mid\mathcal{Q}\_{0};\Sigma\_{0}\mid\vdash c\_{0} \leq c\_{0}:\mathtt{C}\_{\mathtt{unit}}\quad\mathtt{b}:\mathtt{nat}\mid\mathcal{Q}\mid\neg\mathtt{p}:\uparrow\mathtt{C}\_{\mathtt{unit}}}\,\mathtt{(}\mathtt{p}:\uparrow\mathtt{C}\_{\mathtt{unit}}\qquad\mathtt{(}\mathtt{P}\text{-}\mathtt{R}\mathtt{c}\text{-}\mathtt{s}\mathtt{A}\mathtt{nat}\text{)}$$

For a self-related region to be idempotent, its policies Commit, PwOff, and Restore must match the dynamics, so we add dynamic rules for custom regions in Fig. 12. The JIT and atomic region policies and their dynamic rules are instances of these general policies. As an example, the programmer can customize the policies of the frst block of Fig. 1 to not checkpoint variable u. The program remains idempotent as the atomic region never reads u before writing to it. This

γ<sup>0</sup> | NV<sup>0</sup> | V<sup>0</sup> | c<sup>0</sup> = restore(NV, V, κ, Md, γ) [χ ▷ ε] ⊗ γ | Md | n | NV |↑ κ ⇒ [χ ▷ ε] ⊗ γ<sup>0</sup> | Md | n | NV<sup>0</sup> | V<sup>0</sup> | c<sup>0</sup> (D-R-Reg) n > 0 InitGenerald(NV; aID; c; γ; −→arg) = c0, NV0, V<sup>0</sup> [χ ▷ ε] ⊗ aID(c0) | n | NV<sup>0</sup> | V<sup>0</sup> | c<sup>0</sup> ⇒<sup>∗</sup> [χ ′ ▷ ε] ⊗ aID(c0) | n ′ | NV′ | V ′ | skip n ′ > 0 NV<sup>1</sup> = Commit(NV′ ; V ′ ; aID; −→arg) [χ ▷ ε] ⊗ γ | n | NV | Reg[(aID; arg)](c); p ⇒ [χ ′ ▷ ε] ⊗ γ | n ′ | NV<sup>1</sup> | p (D-Reg) V ′ = PwOff(NV, V, Md, γ) [χ ▷ ε] ⊗ γ | Md | · | NV | V | ↓ε # in(b > 0; ↑κ) ⇒ [χ ▷ ε] ⊗ γ | Md | · | NV, V ′ | ε # in(b > 0; ↑κ) (D-S-Reg)

Fig. 12. Custom dynamic rules

policy is implemented by real systems [23,24,41]. Our static typing rules can be extended to reason about them as shown in the companion technical report.

# 6 Metatheory

This section establishes the main properties of the system, which are progress and preservation, adequacy, and the most important result: the fundamental theorem where we prove that statically well-typed programs are semantically well-typed. The theorems and their complete proofs are provided in the companion TR [15].

The progress and preservation theorems assume memory locations to be wellformed, ⊢ Md <sup>γ</sup> NV <sup>|</sup> <sup>V</sup> : <sup>Ω</sup> <sup>|</sup> <sup>Σ</sup>, which is defned similarly to the NV <sup>|</sup> <sup>V</sup> <sup>⊩</sup> <sup>γ</sup> : <sup>Ω</sup> <sup>|</sup> <sup>Σ</sup> used in the logical relation, but imposes extra conditions based on the execution mode Md. It states that γ maps variables in contexts Ω and Σ to the nonvolatile and volatile memories, NV and V, respectively, such that their qualifers and the type of the stored values match. Moreover, it requires specifc properties on the contexts depending on Md; in atomic mode, each checkpointed location in NV and Ω must have copies in V and Σ. We state the theorems below.

Theorem 1 (Progress for Commands). If Md | b R m : nat | Ω; Σ ⊢Sig c : τ , then ∀ n : nat with nRm and ∀ γ, NV, V with ⊢ Md <sup>γ</sup> NV | V : Ω | Σ, either γ | Md | n | NV | V | c is a value, or for some confguration γ ′ <sup>|</sup> Md′ | n ′ <sup>|</sup> NV′ | V ′ | c ′ we have γ | Md | n | NV | V | c → γ ′ <sup>|</sup> Md′ | n ′ <sup>|</sup> NV′ | V ′ | c ′ . Moreover, if Md is an atomic mode, we have NV′ = NV.

Theorem 2 (Preservation for Commands). If Md | b ≥ 0 : nat | Ω; Σ ⊢Sig c : τ , and for some ⊢ Md <sup>γ</sup> NV | V : Ω | Σ and n:nat ≥ 0, we have γ | Md | n | NV | V | c → γ ′ | Md | n ′ <sup>|</sup> NV′ | V ′ | c ′ , then for some Σ1, we have Md | b ≥ 0 : nat | Ω; Σ<sup>1</sup> ⊢Sig c ′ : τ , where ⊢ Md <sup>γ</sup>′ NV′ | V ′ : Ω | Σ<sup>1</sup> and n ′ <sup>≥</sup> <sup>0</sup>.

Theorem 3 (Fundamental Theorem). If b : nat | Ω ⊢ p : ↑Cunit, then <sup>b</sup> : nat <sup>|</sup> <sup>Ω</sup> <sup>⊩</sup> <sup>p</sup> : <sup>↑</sup>Cunit.

Fig. 13. Proof of the fundamental theorem for P-Ckpt

The proof of Theorem 3 is by induction on the static typing derivation for p and considers the last step in the derivation. Fig. 13 explains the idea of the proof for the case where P-Ckpt is the last step of the derivation. By inversion, p = Ckpt[aID, ρ](c); p ′ . Also, c is well-typed for static contexts Ω′ and Σ, where Ω′ = Ω′′, Σck. The goal is to establish point (1) in the fgure: c is related to itself in the term interpretation for arbitrary <sup>n</sup>, <sup>m</sup>, <sup>γ</sup>, NV and <sup>V</sup> where NV <sup>|</sup> <sup>V</sup> <sup>⊩</sup> <sup>γ</sup>::Ω′′, Σck <sup>|</sup> <sup>Σ</sup>. The last condition enforces that the static contexts match the dynamic context. The condition also establishes the more refned well-formedness condition that ⊢ Md <sup>γ</sup> NV | V : Ω | Σ in atomic mode, required by progress and preservation, since it enforces that each checkpointed location in NV and Ω have copies in V and Σ. In particular, NV = NV′ , Vck and range(γ) = dom(NV). When m = 0, the proof is trivial. Consider the case where m = k + 1. By the progress and preservation theorems, the frst confguration can take multiple steps until it becomes a value γ<sup>1</sup> | aID(c) | n ′ | NV | V<sup>1</sup> | c<sup>1</sup> that continues to be well-typed. If n ′ > 0, the second confguration steps similarly to completion and establishes that the two resulting confgurations are in the value relation. This case is not shown in the fgure. If n ′ = 0, the second confguration does not step and instead reaches point (2) in Fig. 13. At point (2), the proof must show that the confgurations are in the value interpretation at type Cunit.

The dashed line in the fgure states that establishing point (2) implies the relation in point (1). The cascade of implications (dashed lines) follows the definition of the value relations at each type. At each step, we invert on the typing rule of the open confguration and show that runtime memories stay well-defned for static contexts. At point (4), we apply the power failure policy for atomic regions, which drops the volatile memory V<sup>1</sup> and creates a mapping using the domain of NV. By the prior conditions established, we know the created mapping is the original mapping γ. At point (6), we apply the restore policy for atomic regions, which creates a new volatile memory based on NV. Again by the prior conditions established, we know the volatile memory created is the original volatile V. The goal at point (6) is similar to our original goal at point (1), except that the proof uses an inductive argument to relate the two confgurations at k.

Finally the Adequacy Theorem states that semantically well-typed programs are idempotent, defned below. The proof is illustrated in Section 5.2.

Defnition 1 (Idempotency). A triple of a program p, nonvolatile memory NV, and a mapping γ is idempotent, if every intermittent execution of the program can be simulated by a continuous execution of it: for all n, n′ , χ1, χ′ 1 , NV′ , p′ , if [χ<sup>1</sup> <sup>▷</sup> <sup>ε</sup>] <sup>⊗</sup> <sup>γ</sup> <sup>|</sup> <sup>n</sup> <sup>|</sup> NV <sup>|</sup> <sup>p</sup> <sup>⇒</sup> [<sup>χ</sup> ′ <sup>1</sup> <sup>▷</sup> <sup>ε</sup>] <sup>⊗</sup> <sup>γ</sup> <sup>|</sup> <sup>n</sup> ′ <sup>|</sup> NV′ | p ′ , then [χ<sup>2</sup> <sup>▷</sup> <sup>ε</sup>] <sup>⊗</sup> <sup>γ</sup> | ∞ | NV <sup>|</sup> <sup>p</sup> <sup>⇒</sup> [χ<sup>2</sup> <sup>▷</sup> <sup>ε</sup>] <sup>⊗</sup> <sup>γ</sup> | ∞ | NV′ | p ′ .

Theorem 4 (Adequacy). Consider <sup>b</sup> : nat <sup>|</sup> <sup>Ω</sup> <sup>⊩</sup> <sup>p</sup> : <sup>C</sup>unit, a nonvolatile memory NV and a bijective map γ that matches qualifers and types from variables in Ω to locations in NV. The triple of p, NV, and γ is idempotent.

# 7 Discussion & Related Work

Intermittent Computing. Surbatovich et al. [41] provide the frst formal framework for reasoning about intermittent execution, give the correctness defnition that we use, and identify precise memory invariants needed for an execution to be correct. Our Crash types capture some of these invariants; capturing all requires reasoning about the efects of non-deterministic sensor inputs, which we leave to future work. This work is the frst to treat intermittent operations at the type level and explore the logical interpretation of intermittent execution. We speculate that our type-based approach using logical relations will provide a cleaner foundation for reasoning about the correctness of more complex intermittent systems, e.g., concurrent ones. Other works that investigate the formal properties of intermittent computing either reason about the efects of intermittent execution on peripheral interactions [9] or enforce timeliness constraints on sensor readings [40], which are orthogonal to ours.

Adjoint Logic. Benton et al. [7,8] provided the frst categorical foundation for using adjoint functors to combine linear and nonlinear logics and showed that a well-behaved calculus requires an independence principle: linear formulae cannot appear in the assumptions of a nonlinear sequent. Follow up works further generalized the system [20,21,36]. There, the relation to Pfenning and Davies's [30] formulation of the lax ⃝ modality was noted; ⃝ corresponds to UF, where F and U are adjunctions between truth and validity categories. Short of a full curryhoward correspondence for our type system and underlying logic, we designed the rules for ↑ and ↓ based on the above calculi. Our stable and unstable contexts correspond to the validity and truth contexts respectively. Thus, we speculate that the combination ↑↓ in our system corresponds to the lax modality.

Several prior works used type systems with adjoint modalities to model switching between program modes [6,14,34], e.g., switching a processes' mode between shared and unshared [6], or adding multicasting, replicable services, and cancellation modes to a session-typed message passing system [34]. We are the frst to use these modalities to handle unforeseen shut-downs and distinguish between stable and power-failure prone modes.

Logical Relations. Prior work [3,42] uses step indexing to ensure the wellfoundedness of logical relations that handle heaps with cyclic references, dynamic memory allocation, or recursive types. Our Crash types model the infnite computation that an atomic region can experience under a non-deterministic number of power failures and re-executions. This recursion necessitates an-indexed relation that limits the number of execution attempts a program can make.

Jung and Tiuryn introduced a logical relation for lambda defnability that allows varying arities [18]. The idea is to increase the arity when passing to later worlds instead of starting with a large arity. Our logical relation can also be viewed as a relation with diferent arities; the initial type of the relation is binary, while after a crash the type of the value relation only corresponds to the intermittent confguration. During these value steps, the relation is unary, with the continuous confguration acting as a kripke world for the intermittent confguration. After restoration, the relation reverts to binary.

Logical relations have been widely used to prove program equivalence, e.g., [2,3,10,16]. At a high level, idempotency is similar to program equivalence, but it handles re-execution and requires us only to prove simulation from an intermittent to continuous run, not vice-versa.

Algebraic Efect Handlers. Algebraic efect handlers [27,31,32,33] give a unifed theory for computational efects, e.g., exceptions and interactive input/output. A handler accesses the continuation to transform the computation. Following efect handler syntax, we write efectful environmental interactions of our system as ε#in(b > 0, ↑κ), where b refers to a natural number returned by the environment and ↑κ is the continuation. Our restore policy resembles a handler, in that it has access to the continuation, but an atomic region may dismiss the continuation, restarting from a saved command.

Crash Hoare Logic. Crash Hoare logic (CHL) [11] ensures the correctness of crash and restore operations in a fle system. CHL extends Hoare logic with a crash condition and a recovery procedure. The crash condition states what happens to the state on a crash. The recovery procedure runs after the crash and manipulates the state before resuming. The system checks that if the program crashes, the storage system will recover to a state consistent with the specifcations. Unlike us, they do not care about idempotency, requiring manual efort to formalize the crash condition and recovery policy. Our syntactic typing fxes the power failure, restore, and commit policies, and our formal results guarantee that following the policies ensures idempotency, the common correctness condition for intermittent execution. We also allow the programmer to formalize bespoke semantically well-typed policies.

# 8 Conclusion

This work provides the frst logical interpretation of intermittent execution. It shows that adjoint logic can be applied to defne Crash types, which internalize the dualities between stable and unstable values, and complete versus partial (re-)executions of intermittent programs. The typing constraints capture invariants of power failure, restoration, and re-execution in intermittent systems. The proofs of progress, preservation, and the fundamental theorem imply the correctness of intermittent systems, i.e. idempotency of execution.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Gradual Tensor Shape Checking

Momoko Hattori() , Naoki Kobayashi() , and Ryosuke Sato()

> The University of Tokyo, Tokyo, Japan {momohatt,koba,rsato}@is.s.u-tokyo.ac.jp

Abstract. Tensor shape mismatch is a common source of bugs in deep learning programs. We propose a new type-based approach to detect tensor shape mismatches. One of the main features of our approach is the best-effort shape inference. As the tensor shape inference problem is undecidable in general, we allow static type/shape inference to be performed only in a best-effort manner. If the static inference cannot guarantee the absence of the shape inconsistencies, dynamic checks are inserted into the program. Another main feature is gradual typing, where users can improve the precision of the inference by adding appropriate type annotations to the program. We formalize our approach and prove that it satisfies the criteria of gradual typing proposed by Siek et al. in 2015. We have implemented a prototype shape checking tool based on our approach and evaluated its effectiveness by applying it to some deep neural network programs.

# 1 Introduction

Tensor Shape Checking and Its Difficulties. Tensor shape mismatch is one of the common sources of dynamic errors in programs using tensors (i.e., multi-dimensional arrays). For example, the reshape operation of tensors takes a tensor x and an integer list S and returns a new tensor of the shape S obtained by realigning the elements in x. The input and output tensors must have the same number of elements; a tensor of shape [2; 3; 4]<sup>1</sup> can be reshaped into a shape [3; 2; 4], while trying to reshape it into [3; 4] results in a runtime error.

Early detection of tensor shape mismatch errors is critical in particular for deep learning programs, where tensors are frequently used. Since deep learning programs often take a considerable amount of time to train networks, it is often the case that a program takes hours and days to compute the weights of deep neural networks only to be terminated by one tensor shape mismatch error, throwing away the trained weights. Even worse, some tensor shape mismatches can be harder to notice: mixing up the height and the width of square images does not raise runtime errors but degrades the performance of the neural network.

The existing work on static detection of tensor shape mismatch errors can be classified into two categories. One is the whole-program analysis approach [17,31], which collects tensor shape information by partially evaluating

<sup>1</sup> In this paper, we denote lists in the OCaml-style as in [1; 2; 3] to disambiguate it from the citations.

```
1 let model s =
2 let f = ... in let g = ... in fun x -> let y = f x in g y
3 let _ = model 1 (Tensor.rand [20])
```
Fig. 1. An OCaml program written with OCaml-Torch.

the program in the style of abstract interpretation. The other is the type-based approach [3,25], which expresses the shapes of tensors as a part of the type information. Still, none of them is fully satisfactory: either they are too conservative and reject valid programs, or fail to detect some shape mismatch errors.

This paper pursuits the type-based approach as it is expected to provide modular detection of tensor shape inconsistencies. Designing an appropriate type system and a type inference procedure to reason about tensor shapes is challenging because shapes are first-class objects. For example, the library function Tensor.zeros of OCaml-Torch [4] (which provides OCaml bindings for libtorch [20]) takes a list S of integers, and returns a new tensor whose shape is S. Thus, we have to work with dependent types: Tensor.zeros would be given the type S : int list → {r : tensor | r.shape = S}. It is difficult to infer such dependent (refinement) types fully automatically. Yet, we wish to avoid programmers' burden of writing too many type annotations.

Another difficulty is that shape constraints can be so complex that even type checking, let alone inference, can be too costly or impossible. For instance, the reshape operation explained earlier needs the proof that the shape of the input tensor x is compatible with the given shape S = [s1; . . . ; sn] (i.e., if the shape of x is to be [s 0 1 ; . . . ; s 0 <sup>m</sup>], then Π<sup>m</sup> <sup>i</sup>=1s 0 <sup>i</sup> = Π<sup>n</sup> <sup>i</sup>=1s<sup>i</sup> holds)<sup>2</sup> . Thus, type checking requires complex reasoning about (non-linear) integer arithmetic and lists.

Overview of Our Approach. Based on the observations above, we propose an approach that is expected to work well in practice despite the above-mentioned difficulties. Our approach can be characterized by three main features: best-effort type inference, hybrid type checking, and gradual typing [27]. We explain them using our prototype tool GraTen<sup>3</sup> .

Best-Effort Type Inference. GraTen does not try to infer the most general types; it performs type/shape inference in a best-effort manner. Thanks to this design choice, GraTen works even if no type annotations are provided (despite that the underlying type system involves dependent types), and yet it can statically detect (not necessarily all but) some shape mismatch errors.

As an example, let us consider the program in Figure 1. The function model takes an integer parameter s, defines functions f and g, and returns a layer (which is a function that takes a tensor and returns a tensor) which composes f

<sup>2</sup> Actually, some s<sup>i</sup> can be −1, in which case the size of the i-th dimension is unspecified.

<sup>3</sup> The tool is publicly available at https://doi.org/10.5281/zenodo.7590480. The source code is also publicly available at https://github.com/momohatt/graten.

```
1 let model s =
2 let f = ... in let g = ... in
3 fun x -> let y = if s = 1 then x else f x in g y
```
Fig. 2. The program from Figure 1 with small modification.

```
1 let model s =
2 let f = ... in let g = ... in
3 fun x -> let y = if s = 1 then x else f x in
4 g (assert (y.shape = [10]); y)
```
Fig. 3. The program returned by GraTen given the program in Figure 2.

and g. The definitions of f and g are omitted here, but their types are assumed as below, where s in the type of f is the argument of model and the function nth(n, S) returns the n-th element of the list S (the index starts with 0).

$$\mathbf{f} : x . \{ \nu : \mathtt{tensor} \mid \mathtt{1en}(\nu.\mathtt{shape}) = 1 \} \to \mathtt{tensor} \left( [\mathtt{nth}(0, x.\mathtt{shape})/\mathtt{s}] \right)$$

g : tensor([10]) → tensor([1])

These types indicate that f takes a 1-dimensional tensor (i.e., a vector) and returns a vector whose length equals the length of the argument vector divided by s, and that g expects a vector of length 10 and returns a vector of length 1. The formal syntax of types will be introduced later in Section 2.

For the program above, GraTen's best-effort inference outputs the following type for the function model.

s:int → x: {ν:tensor | len(ν.shape) = 1 ∧ nth(0, ν.shape)/s = 10} → tensor([1])

Here, the constraint nth(0, ν.shape)/s = 10 for the shape of x is necessary for this program not to raise a shape mismatch error at the application of g. The inferred type of model is used to prevent any calls to model that violate the constraint. Indeed, GraTen rejects the call on line 4 of Figure 1, where the arguments do not satisfy the constraint nth(0,ν.shape) <sup>s</sup> = 10. As in this example, our approach can statically detect shape mismatches when enough type information has been obtained from the best-effort type inference or user-provided type annotations.

Hybrid Type Checking. Another main feature of our approach is hybrid type checking: we combine static and dynamic checking. The type checker inserts assertions to program points where the type safety is not statically guaranteed, à la Knowles and Flanagan's hybrid type checking [16]. For example, consider the program in Figure 2, which is obtained by adding a conditional branch to the one in Figure 1. The type of the then and else branch of the if expression are inferred to be tensor(x.shape) and tensor([ nth(0,x.shape) s ]), respectively. In this case, the type of y is simply inferred to be tensor without any information about its shape, and the inferred type for model is as follows.

s:int → x:{ν : tensor | len(ν.shape) = 1} → tensor([1])

Thus, the best-effort inference of GraTen fails to capture the constraint nth(0,ν.shape) <sup>s</sup> = 10 for x due to the imprecise type information of y. Along with

```
1 let model s =
2 let f = ... in let g = ... in
3 fun x ->
4 let y = ((if s = 1 then x else f x) : tensor([nth 0 x.shape / s]))
5 in g y
```
Fig. 4. The program from Figure 2 after adding type annotations.

the inferred types, GraTen outputs the program in Figure 3, which is the same as the original program except for the assertion inserted at the argument of g. Since the statically inferred type of y fails to guarantee that the application of g to y does not leads to a shape mismatch error, GraTen inserts the assertion to check the requirement dynamically.

Gradual Typing. Lastly, our approach incorporates gradual typing [27] 4 so that the users can improve the precision of inferred types by adding type annotations. For example, let us consider the program in Figure 4, which is obtained from the one in Figure 2 by adding a type annotation to y. With this annotation, GraTen infers the same type for model as it did for model in Figure 1, and no assertions are inserted. As such, adding correct type annotations improves the type checking and decreases the number of assertions inserted.

Thanks to the best-effort inference, users need not add type annotations to everywhere in the program. They can focus on the program points where the static inference did not perform well, which is indicated by the insertion of assertions. We prove that our type system satisfies the gradual guarantee [27], which ensures that adding type annotation preserves the type-ability and the behavior of the program (with some assertions inserted) regardless of its precision, as long as the annotation does not disagree with the program.

Among the three features, the notion of hybrid type checking was first proposed by Knowles and Flanagan [16], and our gradual typing is closely related to gradual refinement types by [18], but we believe that the particular combination of three features is new. In particular, unlike the original gradual refinement types [18], we insert assertions instead of carrying around evidence terms [11] in the reduction to guarantee type safety.

The contributions are summarized as follows. (i) The formalization of a type system that combines hybrid type checking and gradual typing. We define our type system as the type-based transformation relation from source programs to programs with run-time assertion checks. We prove the soundness of our type system as well (Section 2). (ii) A proof that our system satisfies the gradual guarantee [27] (Section 3). (iii) Implementation of a best-effort type inference

<sup>4</sup> Usually, gradual typing introduces new syntax for gradual types and makes a distinction between static types and gradual types. However, our type system does not have such distinction; it only uses the standard refinement types. As we see later, we extend the standard refinement type system with cast (assertion) insertion rules so that it can be viewed as a gradualized type system.

$$\begin{aligned} M \text{ (term)} ::= c \mid x \mid \lambda x ; \tau. M \mid M \, x \mid (M : \tau) \mid \mathtt{1et} \, x = M\_1 \text{ in } M\_2\\ \mid \quad \mathtt{fix}(f \colon (x \colon \tau\_1 \to \tau\_2), x, M) \mid \mathtt{if} \, x \, \mathtt{then} \, M\_1 \, \mathtt{else} \, M\_2\\ \tau \, \text{(type)} ::= \{x : B \mid \varphi\} \mid x ; \tau\_1 \to \tau\_2\\ \Gamma \, \text{(type env.)} ::= \mathcal{B} \mid I, x : \tau \qquad \Delta \, \text{(base type env.)} ::= \mathcal{B} \mid \Delta, x : B \end{aligned}$$

Fig. 5. Syntax of the source language, the types and the type environments.

on a prototype system GraTen inference (Section 4). (iv) Experimental evaluation of GraTen using the examples of deep learning programs bundled in the OCaml-Torch library. We confirm that GraTen can statically type-check the programs effectively with a reasonable amount of type annotations (Section 5).

# 2 A Gradually-Typed Language with Refinement Types

In this section, we formalize our type system and the translation to insert assertions. We first introduce the source and target languages of the translation in Sections 2.1 and 2.2. We then formalize the type system and the translation and prove their soundness in Section 2.3. The gradual guarantee is discussed later in Section 3.

#### 2.1 Source Language

We consider a call-by-value functional language, whose syntax is given in Figure 5. Throughout this paper, n, c, and x respectively denote integers, constants (including integers and primitive functions) and variables. The base types B and refinement predicates ϕ are explained later.

Type annotations can be added to the function arguments λx:τ.M, recursive functions fix(f:(x:τ<sup>1</sup> → τ2), x, M) and to arbitrary expressions by (M : τ ). In the implementation of GraTen, users may omit the type annotations in lambda expressions and recursive functions as the best-effort type inference tries to complete them.

The argument of a function application and the branching condition of an if-expression are restricted to variables for the sake of simplicity of typing rules. Note that this restriction does not lose generality, as a general function application M<sup>1</sup> M<sup>2</sup> can be normalized to let f = M<sup>1</sup> in let x = M<sup>2</sup> in f x.

Types are defined following the standard definition of refinement types. Intuitively, the type {x:B | ϕ} describes a value x of type B such that ϕ holds. For example, {x:int | x ≥ 0} is the type of non-negative ints. We may omit the refinement predicates when they are true. For example, we may write {x:int | true} as int.

The language presented so far is general; in GraTen it is instantiated to a language for tensor programs by defining the base types and refinement predicates as in Figure 6, and assuming that primitive operations on tensors are included in the set of constants ranged over c. The refinement predicates, shapes

$$\begin{array}{l|l} B \text{ (base type)} ::= \textbf{bool} \mid \textbf{int} \mid \textbf{int} \mid \textbf{test} \mid \textbf{tensor} \\ \varphi \text{ (predicted)} ::= \textbf{true} \mid \textbf{false} \mid s\_1 = s\_2 \mid S\_1 = S\_2 \mid x \mid \neg \varphi \mid \varphi\_1 \land \varphi\_2 \mid \varphi\_1 \lor \varphi\_2 \\ \mid \textbf{breakc} \textbf{cases} \textbf{else} \mid \textbf{false} \textbf{else} \mid \textbf{false} \textbf{else} \mid s\_1 . S\_2 \mid \\ S \left(\textbf{shape}\right) ::= \left[s\_1 ; \dots , s\_n\right] \mid x \mid x . \mathtt{shape} \mid \textbf{cons}(s, S) \mid \textbf{append}(S\_1, S\_2) \mid \mathtt{tail}(S) \\ \mid \textbf{init}(S) \mid \textbf{insert} \textbf{At}(s\_1, s\_2, S) \mid \textbf{drop} \textbf{at}(s, S) \mid \textbf{swap}(s\_1, s\_2, S) \\ \mid \textbf{resh shape}(S\_1, S\_2) \mid \textbf{breakc} \textbf{at}(S\_1, S\_2) \mid \textbf{match}(S\_1, S\_2) \\ s \left(\textbf{size}\right) ::= n \mid x \mid -s \mid s\_1 + s\_2 \mid s\_1 \times s\_2 \mid \frac{s\_1}{s\_2} \mid \textbf{head}(S) \mid \textbf{last}(S) \\ \mid \textbf{let}(S) \mid \textbf{nth}(s, S) \mid \textbf{pred}(S) \end{array}$$

Fig. 6. Syntax of base types B and predicates ϕ in GraTen.

v (value) ::= c | x | [v1, . . . , vn] | λx<sup>τ</sup> .N | fix(f τ , x, N) N (cast term) ::= v | if v then N<sup>1</sup> else N<sup>2</sup> | N v | let x <sup>τ</sup> = N<sup>1</sup> in N<sup>2</sup> | assert(ϕ); N Fig. 7. Syntax of the target language.

and sizes are expressions of type bool, int list and int respectively. The supported predicates are those described by quantifier-free formulas of first-order logic. As shown in the definition, they may use some built-in predicates and functions over integer lists such as append and primitives on integer arithmetic in order to express common tensor operations. We implicitly assume that the refinement predicates are well formed (as defined in the full version [13]).

#### 2.2 Target Language

As explained in Section 1, we insert run-time checks into places where type-safety cannot be statically guaranteed. Figure 7 shows the syntax of programs obtained by the insertion of assertions. A main difference from the source language is the addition of assertion assert(ϕ); N, which is used to implement the run-time checks. Like Flanagan's hybrid type system [16] (and unlike the blame calculus [32]), we guarantee the safety of target programs by assertions. Compared with the blame calculus, this method is expected to be easier to implement since most of the modern programming languages are equipped with assertions, and more efficient in that it avoids the accumulation of dynamic casts at runtime. This implementation of the dynamic cast is possible since our system is only "gradualized" at the predicate level of the refinement type and the underlying simple type is static.

Another difference is that the binders in let expressions are annotated with their type. This is required when defining the precision relation over the cast terms in Section 3.

The substitution and the reduction rules of the cast terms are presented in Figure 8. The evaluation of primitive function ev(c, v) is defined to be the return value of the primitive function c applied to an argument v if v meets the

$$\begin{array}{c} \boxed{[v/x]N} \\ \hline \\ [v/x](\mathtt{assert}(\varphi); N) = \mathtt{assert}([v/x]\varphi); [v/x]N \\ \end{array} \\ \begin{array}{c} \boxed{N\_1 \longrightarrow N\_2} \\ \hline \\ [v/x](\lambda y^\top.N) = \lambda y^{[v/x]\tau}.[v/x]N \\ \end{array} \\ \begin{array}{c} \textbf{assert}(\mathtt{true}); N \longrightarrow \mathtt{assert}(\mathtt{false}) \\ \end{array} \\ \begin{array}{c} \textbf{assert}(\mathtt{false}) \\ \end{array} \\ \begin{array}{c} \textbf{assert}(\mathtt{false}) \\ \end{array} \\ \begin{array}{c} \textbf{assert}(v) \\ \end{array} \\ \begin{array}{c} \textbf{assert}(v) \\ \end{array} \\ \end{array} \end{array}$$

Fig. 8. Selected rules of substitution and reduction of the target language (the full definition is given in the full version [13]).

$$\Gamma; \varphi \vdash c: ty \text{(c) (CT-Con)} \quad \frac{\Gamma(x) = y: \tau\_1 \to \tau\_2}{\Gamma; \varphi \vdash x: \Gamma(x)} \quad \frac{\Gamma(x) = \{y: B \mid \varphi'\}}{\Gamma; \varphi \vdash x: \{y: B \mid y = x\}} \text{ ( $\text{CT-VB}$ )}$$

$$\frac{\Gamma; x:\tau\_1; \varphi \vdash N: \tau\_2}{\Gamma; \varphi \vdash \lambda x^{\tau\_1}.N: x:\tau\_1 \to \tau\_2} \text{ (CT-LAM) } \frac{\Gamma; \varphi \vdash N: x:\tau\_1 \to \tau\_2 \qquad \Gamma; \varphi \vdash v: \tau\_1}{\Gamma; \varphi \vdash N: [v/x]\tau\_2} \quad \text{(CT-Aprop)}$$

$$\frac{\Gamma, f: (x: \tau\_1 \to \tau\_2), x: \tau\_1; \varphi \vdash N: \tau\_2}{\Gamma; \varphi \vdash \mathsf{fix}(f^{x:\tau\_1 \to \tau\_2}, x, N): x: \tau\_1 \to \tau\_2} \text{ (CT-Fix)}\\\frac{\Gamma; \varphi \land \varphi' \vdash N: \tau}{\Gamma; \varphi \vdash \mathsf{assert}(\varphi'); N: \tau} \text{ (CT-Ass)}$$

$$\frac{\Gamma; \varphi \vdash v : \{x : \mathsf{bool} \mid \varphi'\} \qquad \Gamma; \varphi \land v \vdash N\_1 : \tau \qquad \Gamma; \varphi \land \neg v \vdash N\_2 : \tau}{\Gamma; \varphi \vdash \mathsf{if} \ v \ \mathsf{then} \ N\_1 \ \mathsf{else} \ N\_2 : \tau} \qquad \text{(CT-IF)}$$

$$\begin{array}{c} \Gamma; \varphi \vdash N\_1 : \tau\_1 \qquad \Gamma, x : \tau\_1; \varphi \vdash N\_2 : \tau \\ \hline \Gamma; \varphi \vdash \textbf{1et} \; x^{\tau\_1} = N\_1 \; \textbf{in} \; N\_2 : \tau \\ \end{array} \quad \begin{array}{c} \Gamma; \varphi \vdash N : \tau' \qquad \Gamma; \varphi \vdash \tau' \; \circ \; \tau \\ \hline \Gamma; \varphi \vdash N : \tau \\ \end{array} \tag{CT-\text{-}Sus}$$

constraint of the argument of c, and otherwise undefined. We denote N ⇑ if there exists an infinite reduction sequence from N.

The substitution for cast terms is defined in the standard manner, except that the implicitly-annotated type information and the predicate in the assertion need to be updated as well. As can be seen in the definition of the cast term reduction, these implicitly-annotated types are only required for the sake of formalization and ignored at runtime.

We also introduce the type derivation rules for the cast terms Γ; ϕ ` N : τ in Figure 9. This relation is used in the discussion of the soundness of the type system later in Section 2.3. The quadruple relation Γ; ϕ ` N : τ denotes that a cast term N has type τ under a type environment Γ and a logical context ϕ. The logical context ϕ holds the information of logically valid predicates at respective program points. New predicates are added at the then branch and the else branch of (CT-If), and the post-assertion cast term in (CT-Ass). The subsumption is allowed in (CT-Sub) by the subtyping relation Γ; ϕ ` τ<sup>1</sup> <: τ<sup>2</sup> (Figure 10), which is defined in a standard manner.

$$\begin{array}{c|c} \hline \mathsf{ } \mathsf{ } \mathsf{{\{\varvarPi}\{\varGamma\}} \mathsf{{\{\varvarPi}\{\varGamma\}} & \mathsf{{\{\varvarPi}\}} \\ \hline \mathsf{{\{\varvarPi}\{\varGamma, x:\,\{\text{y}:\,B\}\mathsf{{\{\varvarPi}\}\}\mathsf{{\{\varvarPi}\}\mathsf{{\{\varll}}\}}} & \mathsf{{\{\varcurPi}\{\varll}\{\varGamma\}} \\ \hline \mathsf{{\{\varvarPi}\{\varGamma, x:\,(y:\tau\_{1}\to\tau\_{2})\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}} & \mathsf{{\{\varll}\mathsf{{\{\varll}}\}} \\ \hline \mathsf{{\varll}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\} & \mathsf{{\{\varll}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}} \\ \hline \mathsf{{\varll}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\} & \mathsf{{\{\varll}}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\} \\ \hline \mathsf{{\varll}\mathsf{{\varll}}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{\{\varll}}\}\mathsf{{$$

$$\begin{array}{ll} \Gamma; \varphi \vdash c \multicolumn{1}{c}{r} (\mathsf{CI-Cons} \mathsf{T}) & \frac{\Gamma(x) = y \tau \Box \ \tau\_{2}}{\Gamma; \varphi \vdash x \multicolumn{1}{c}{r} (\mathsf{CI-VaR-Fons} \mathsf{T})} & \frac{\Gamma(x) = y \tau \Box \ \tau\_{2}}{\Gamma; \varphi \vdash x \multicolumn{1}{r}{\tau} (\mathsf{CI-VaR-Fons} \mathsf{T})} & \frac{\Gamma(x) = y \tau \Box \ \tau\_{2}}{\Gamma; \varphi \vdash x \multicolumn{1}{r}{\tau} (\mathsf{CI-VAR-Fons} \mathsf{T})} \\ \hline \Gamma; \varphi \vdash x \multicolumn{1}{r}{\tau} (\mathsf{I} \vdash y \multicolumn{1}{r}{\tau}) & \frac{\Gamma, x \multicolumn{1}{r}{\tau} \vdash z \multicolumn{1}{r}{\tau}}{\Gamma; \varphi \vdash x \multicolumn{1}{r}{\tau} \Gamma \vdash x \multicolumn{1}{r}{\tau} \Gamma \vdash x \multicolumn{1}{r}{\tau}} & \frac{\Gamma \vdash x \multicolumn{1}{r}{\tau} \Gamma \vdash x \multicolumn{1}{r}{\tau}}{\Gamma; \varphi \vdash x \multicolumn{1}{r}{\tau} \Gamma \vdash x \multicolumn{1}{r}{\tau} \Gamma \vdash x \multicolumn{1}{r}{\tau}} \\ \hline \\ \Gamma; \varphi \vdash M\_{1} \multicolumn{1}{r}{\tau} \gamma \vdash \tau\_{2} & \Gamma \vdash \varphi \multicolumn{1}{r}{\tau} \Gamma \vdash \tau\_{2} \big[ \mathsf{C} \vdash M\_{1} \blacktickern-{1}{r}{\tau} \Gamma \vdash x \multicolumn{1}{r}{\tau} \Gamma \vdash x \multicolumn{1}{r}{\tau} \Gamma$$

$$\begin{array}{ll} \text{1, } \varphi: \text{ in } \mathsf{T} \to \mathsf{T} \\ \hline T; \varphi \vdash (M:\tau) \leadsto N: \tau \\ \qquad \text{(CI-ANNOT)} \\ \end{array} \qquad \begin{array}{ll} \begin{array}{ll} \text{1, } \varphi: \text{ in } \mathsf{T}\_1 \leadsto N\_1: \tau \\ \qquad \text{T } ; \varphi \vdash \mathsf{T}\_1 \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{N} \ \mathsf{x}: \tau \\ \qquad \text{(CI-Sinus)} \\ \end{array} \\ \begin{array}{ll} \text{(CI-Sinus)} \\ \qquad \text{(CI-Sinus)} \\ \end{array} \\ \begin{array}{ll} \text{(CI-Sinus)} \\ \qquad \text{(CI-Sinus)} \\ \qquad \text{(I} \\ \end{array} \\ \begin{array}{ll} \text{(I)} \\ \text{(I)} \\ \text{(I)} \\ \end{array} \end{array}$$

#### 2.3 Typing Rules

Inserting Assertions Next, we discuss the typing rules for the source language and the assertion insertion into it. Figure 11 defines the type judgement and cast insertion relation. The intuition of 5-ary relation Γ; ϕ ` M N : τ is: under a type environment Γ and a logical context ϕ, a term M translates to a cast term N and has type τ . If we ignore the part " N" and replace the gradual subtyping relation . with the standard subtyping relation on refinement types (Figure 10), our type system is a standard refinement type system. Thus, the main novelty in the rules in Figure 11 lies in the use of the consistent subtyping relation Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N, which is explained below.

The consistent subtyping relation Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N (Figure 12) 5 is used in the cast insertion relation to guarantee that there exists a value that has both of the types τ<sup>1</sup> and τ<sup>2</sup> under Γ and ϕ, and to produce an assertion term N that checks at runtime if a value that is statically known to be of type τ<sup>1</sup> can be used as a value of type τ2.

The rule for the base case (Cast-Base) checks if there exists a value, and an assignment of the values to the variables in the type environment, that satisfies both τ<sup>1</sup> and τ2. This intuitively holds if τ<sup>1</sup> is castable to τ<sup>2</sup> for some runtime values. The rule also produces a lambda function that implements the cast with an assertion. It is defined in such a way that ϕ<sup>2</sup> can always be used as the content of the assertion ϕ 0 , but true can also be used for ϕ 0 if ϕ<sup>1</sup> implies ϕ2. Note that we cannot use ϕ<sup>2</sup> as the content of the assertion in the definition, or otherwise Proposition 1 does not hold.

The rule for the function types (Cast-Fun) recursively checks the castability of the argument types and the return types and combines the assertion terms for them. Notice how the subsumption for the return types τ<sup>2</sup> and τ<sup>4</sup> has the meet of two argument types τ<sup>1</sup> u τ<sup>3</sup> in the type environment. The meet of two types (Figure 12) is defined as a conjunction of the refinement predicates<sup>6</sup> .

The consistent subtyping relation can be seen as a gradualization of the subtyping relation Γ; ϕ ` τ<sup>1</sup> <: τ<sup>2</sup> (Figure 10). In fact, when a type τ<sup>1</sup> is a subtype of another type τ2, it is possible that the assertion term generated by casting τ<sup>1</sup> to τ<sup>2</sup> only contains assertions that always succeed, which can be erased by some optimization. The following proposition states this fact. Note that this corresponds to the blame-subtyping theorem, one of the criteria for gradual typing presented in [27].

Proposition 1. Γ; ϕ ` τ<sup>1</sup> <: τ<sup>2</sup> implies Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N for some N where all the assertions in N are of the form assert(true); N<sup>0</sup> .

<sup>5</sup> This can be understood as the refinement-type version of the differential subtyping in [23], although in the implementation we do not calculate the "difference" between ϕ<sup>1</sup> and ϕ<sup>2</sup> for ϕ 0 in the assertion unless ϕ<sup>1</sup> implies ϕ<sup>2</sup> (and thus ϕ 0 can be true).

<sup>6</sup> Although the meet of two function types is defined, it does not make any difference in the definition of consistent subtyping relation since function types in the type environment is not used.

$$\begin{aligned} \{x:B \mid \varphi\_1\} \sqcap \{x:B \mid \varphi\_2\} &= \{x:B \mid \varphi\_1 \land \varphi\_2\} \\ (x:\tau\_1 \to \tau\_2) \sqcap (x:\tau\_3 \to \tau\_4) &= x:(\tau\_1 \sqcap \tau\_3) \to (\tau\_2 \sqcap \tau\_4) \end{aligned}$$

$$\begin{array}{llll} \mathsf{F} \exists \mathsf{BT}(\varGamma), x : & B.\mathsf{A}(\varGamma) \land \varphi \land \varphi\_{1} \land \varphi\_{2} & \mathsf{\vdash \forall \mathsf{BT}(\varGamma), x : B.\mathsf{A}(\varGamma) \land \varphi \land \varphi\_{1} \Rightarrow (\varphi' \leftrightarrow \varphi\_{2})\\ \hline & \Gamma; \varphi \vdash \{x : B \mid \varphi\_{1} \} \lesssim \{x : B \mid \varphi\_{2} \} \sim \sim \lambda x^{\{x : B \mid \varphi\_{1} \}}.\mathsf{assert}(\varphi'); x\\ & & & \text{(Cast-Baste)}\\ \Gamma; \varphi \vdash \tau\_{3} \leq \tau\_{1} \leadsto N\_{1} & \Gamma, x : \tau\_{1} \sqcap \tau\_{3}; \varphi \vdash \tau\_{2} \leq \tau\_{4} \leadsto N\_{2} & \mathsf{(C1 : \text{true} \,\text{Furs})} \end{array}$$

$$\begin{array}{c} \begin{array}{l} 1, \varphi \vdash\_{3} \sim\_{1} \dots \vdash\_{1} \quad \mathsf{1}, \varphi \vdash\_{1} \mathsf{1} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \end{array} \text{(Cast-Fun)}\\\hline \begin{array}{l} \Gamma; \varphi \vdash\_{1} x \mathrel{\tau\_{1} \to} \tau\_{2} \ \overset{\mathsf{\pi}}{\sim} x \mathrel{\tau\_{3} \to} \; \begin{array}{l} \mathsf{T} \ \mathsf{A} \mathsf{\pi} \ \mathsf{T} \ \mathsf{T} \ \mathsf{T} \end{array} \text{(Cast-Fun)}\\\ \begin{array}{l} \lambda f^{x \mathrel{\tau\_{1} \to} \tau\_{2}} \ \lambda x^{\tau\_{3}}. \mathsf{A} x^{\tau\_{1}} \end{array} \text{.} \begin{array}{l} \mathsf{T} \ \mathsf{A} \mathsf{\pi} \ \mathsf{T} \ \mathsf{T} \ \end{array} \text{(Cast-Fun)}\\\hline \begin{array}{l} \lambda f^{x \mathrel{\tau\_{1} \to} \tau\_{2}} \ \lambda x^{\tau\_{3}}. \mathsf{A} x^{\tau\_{1}} \end{array} \text{.} \begin{array}{l} \mathsf{T} \ \mathsf{A} \mathsf{\pi} \ \end{array} \text{.} \begin{array}{l} \mathsf{T} \ \mathsf{A} \mathsf{\pi} \ \end{array} \text{.} \begin{array}{l} \begin{array}{l} \Sigma\_{1} \ \mathsf{T} \end{array} \text{.} \begin{array}{l} \Sigma\_{2} \ \end{array} \text{.} \begin{array}{l} \Sigma\_{3} \ \end{array}$$

Fig. 12. Definition of the consistent subtyping relation Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N.

Type Safety We conclude this section with a note on the soundness of our type system. The soundness is based on the fact that if the source program is well-typed, the program after the assertion insertion is also well-typed.

The most critical part of the proof is to prove the assertion term can be assigned a function type from the pre-assertion type to the post-assertion type.

Lemma 1. Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N implies Γ; ϕ ` N : x:τ<sup>1</sup> → τ<sup>2</sup> for some variable x that does not occur in τ2.

The proof is found in the full version [13]. With Lemma 1, we can prove that the assertion-inserted program can be assigned the same type as that of the original program.

Lemma 2 (Assertion Insertion Preserves Types). Γ; ϕ ` M N : τ implies Γ; ϕ ` N : τ .

We can also prove the standard progress and preservation properties under a reasonable assumption that the types of the primitive functions are properly defined as follows (see the full version [13] for the proofs).

Assumption 1 ` c v : τ implies ev(c, v) is defined and ` ev(c, v) : τ

Combining Lemma 2 with the progress and preservation properties, we obtain the type safety as follows.

Theorem 1 (Type Safety). With Assumption 1, ∅; true ` M N : τ implies N −→<sup>∗</sup> v for some v, N ⇑, or N −→<sup>∗</sup> error.

The type safety property states that a well-typed program does not cause untrapped dynamic errors. The only case where a cast-inserted program causes untrapped errors is when the result of an application of a primitive function is undefined (i.e., ev(c, v) is undefined). The type safety property ensures that such untrapped errors do not happen for well-typed terms as long as the ty(c) is defined appropriately.

$$\begin{array}{c} \begin{array}{c} \widetilde{x} \vdash \tau\_{1} \sqsubseteq \tau\_{2} \\ \hline \end{array} \\ \begin{array}{c} \vdash \forall \widetilde{y}, x.\varphi\_{1} \Rightarrow \varphi\_{2} \\ \hline \widetilde{y} \vdash \{x:B \mid \varphi\_{1}\} \sqsubseteq \{x:B \mid \varphi\_{2}\} \end{array} \text{(PRED-BASE)} \end{array} \quad \begin{array}{c} \begin{array}{c} \Gamma\_{1} \sqsubseteq \Gamma\_{2} \\ \hline \end{array} \\ \begin{array}{c} \mathcal{D} \sqsubseteq \mathcal{D} \\ \hline \end{array} \\ \begin{array}{c} \widetilde{y} \vdash \tau\_{1} \sqsubseteq \tau\_{3} \\ \hline \widetilde{y} \vdash x:\tau\_{1} \to \tau\_{2} \end{array} \text{(PRED-FUN)} \end{array} \quad \begin{array}{c} \begin{array}{c} \Gamma\_{1} \sqsubseteq \mathcal{D} \\ \hline \end{array} \\ \begin{array}{c} \Gamma\_{1} \sqsubseteq \Gamma\_{2} \\ \hline \end{array} \text{(PERS-FUN)} \end{array} \quad \begin{array}{c} \begin{array}{c} \Gamma\_{1} \sqsubseteq \mathcal{D} \\ \hline \end{array} \text{(PERS-FUN)} \end{array} \quad \begin{array}{c} \Gamma\_{2} \sqsubseteq \mathcal{D} \\ \hline \end{array} \text{(PERS-FUN)} \end{array}$$

Fig. 13. Precision relation of types and type environments.

# 3 Gradual Guarantee

In a standard gradual type system, programs are compared by their precision, or the amount of information contained in the type annotations. This notion is used to define the gradual guarantee [27], which is the core property of gradual typing. The gradual guarantee comes in two parts. The first one is called static gradual guarantee, which states that decreasing the precision of type annotation from a well-typed program still preserves the typeability of the program at a less precise type. The second one is called dynamic gradual guarantee, which claims that a less precise program behaves the same as the more precise one with fewer assertion errors.

Below we first define the precision for the language introduced in Section 2. We then show that our type system satisfies the gradual guarantee.

Precision. Figure <sup>13</sup> defines the precision relation <sup>x</sup><sup>e</sup> ` <sup>τ</sup><sup>1</sup> <sup>v</sup> <sup>τ</sup><sup>2</sup> on types by using the logical implication between the refinement predicates. The sequence of variables <sup>x</sup><sup>e</sup> keeps the variables that may appear in the refinement predicates. For example, the following is an example of the type precision relation for the base type.

` {x : tensor | x.shape = [3]} v {x : tensor | len(x.shape) = 1}

Note that in the rule (Prec-Fun), the precision of the argument type and the return type are compared independently; the type information on x is not used in the comparison of the return types. This is in contrast with the rule (Sub-Fun) in Figure 10 for subtyping. Figure 13 also extends the relation to Γ v Γ <sup>0</sup> on type environments. The precision relation is also extended to the relation <sup>x</sup><sup>e</sup> ` <sup>M</sup> <sup>v</sup> <sup>M</sup><sup>0</sup> on terms, by the rules in Figure 14. Here, <sup>x</sup><sup>e</sup> is the sequence of variables in scope. Finally, we define the precision relation of the cast terms in Figure 14. Unlike the term precision relation (Figure 14), the precision relation Γ; ϕ ` N<sup>1</sup> v N<sup>2</sup> on cast terms requires the type environment Γ and the logical context ϕ in the judgement, and the refinement extraction from the type environment Φ(Γ) is used in the rule (PC-Assert). We also assume the following property on the evaluation of the primitive functions.

Assumption 2 If ev(c, v2) and ev(c, v1) are both defined, then v<sup>1</sup> v v<sup>2</sup> implies ev(c, v1) v ev(c, v2)

$$\begin{array}{c} \widetilde{x} \vdash M\_{1} \sqsubseteq M\_{2} \\ \hline \\ \widetilde{y} \vdash \tau\_{1} \sqsubseteq \tau\_{2} \\ \widetilde{y} \vdash \lambda x \mathrel{\tau\_{1}} \sqsubseteq \lambda x \mathrel{\tau\_{1}} \sqsubseteq M\_{2} \\ \cline{2-4} \\ \hline \\ \Gamma ; \varphi \vdash N\_{1} \sqsubseteq N\_{2} \\ \hline \\ \Gamma ; \varphi \vdash N\_{1} \sqsubseteq N\_{2} \\ \hline \\ \Gamma ; \varphi \vdash \mathsf{assert}(\varphi\_{1}) \mathrel{\tau\_{1} \sqsubseteq N\_{2}} \\ \hline \\ \Gamma ; \varphi \vdash \mathsf{assert}(\varphi\_{1}) \mathrel{\tau\_{1} \sqsubseteq N\_{2} \\ \hline \\ \Gamma ; \varphi \vdash \mathsf{assert}(\varphi\_{1}) ; N\_{1} \sqsubseteq \mathsf{assert}(\varphi\_{2}) ; N\_{2} \\ \hline \\ \end{array} \quad \begin{array}{c} \widetilde{y} \vdash M\_{1} \sqsubseteq M\_{2} \\ \hline \\ \widetilde{y} \vdash \tau\_{1} \sqsubseteq M\_{2} \\ \hline \\ \widetilde{y} \vdash \tau\_{1} \sqsubseteq M\_{2} \\ \hline \\ \Gamma ; \varphi \vdash \mathsf{assert}(\varphi\_{1}) ; N\_{1} \sqsubseteq \mathsf{assert}(\varphi\_{2}) ; N\_{2} \\ \hline \\ \end{array} \quad \begin{array}{c} \widetilde{y} \vdash \Gamma \mathcal{L} \implies M\_{2} \quad \widetilde{y} \vdash \tau\_{1} \sqsubseteq \mathsf{x} \\ \hline \\ \widetilde{y} \vdash \Gamma \mathcal{L} \implies \mathsf{g} \, \widetilde{\mathcal{L}} \mathcal{L} \implies \mathcal{L} \mathcal{L} \\ \hline \\ \end{array}$$

Fig. 14. Selected rules for the precision relation on terms and cast terms (the full definition is found in the full version [13]).

Intuitively, the precision of cast terms are designed in such a way that, when ∅; true ` N<sup>1</sup> v N<sup>2</sup> holds, the assertions in N<sup>1</sup> is more strict than that of N2, and therefore the dynamic checks in N<sup>1</sup> is more likely to fail than in N2. The following two propositions state this intuition (the proofs are found in the full version [13]).

Proposition 2. Suppose ∅; true ` N<sup>1</sup> : τ and ∅; true ` N<sup>2</sup> : τ 0 . Then, ∅; true ` N<sup>1</sup> v N<sup>2</sup> and N<sup>1</sup> −→ N<sup>0</sup> 1 imply N<sup>2</sup> −→ N<sup>0</sup> <sup>2</sup> and ∅; true ` N<sup>0</sup> <sup>1</sup> v N<sup>0</sup> 2 for some N<sup>0</sup> 2 .

Proposition 3. Suppose ∅; true ` N<sup>1</sup> : τ and ∅; true ` N<sup>2</sup> : τ 0 . Then, ∅; true ` N<sup>1</sup> v N<sup>2</sup> and N<sup>2</sup> −→ N<sup>0</sup> 2 imply either of the following.

– N<sup>1</sup> −→ N<sup>0</sup> <sup>1</sup> and N<sup>0</sup> <sup>1</sup> v N<sup>0</sup> 2 for some N<sup>0</sup> 1 – N<sup>1</sup> −→ error

Gradual Guarantee. We show that our system satisfies the gradual guarantee [27]. First, we prove that the consistent subtyping relation Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> <sup>N</sup> is upper-closed with respect to the precision relation <sup>x</sup><sup>e</sup> ` <sup>τ</sup><sup>1</sup> <sup>v</sup> <sup>τ</sup><sup>3</sup> on types.

Lemma 3. Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N1, dom(Γ) ` τ<sup>1</sup> v τ3, dom(Γ) ` τ<sup>2</sup> v τ<sup>4</sup> and Γ v Γ 0 implies Γ 0 ; ϕ ` τ<sup>3</sup> . τ<sup>4</sup> N<sup>2</sup> for some N2.

We can further prove that the cast term N<sup>2</sup> in the statement of Lemma 3 is less precise than the original cast term N<sup>1</sup> as follows.

Lemma 4. Suppose Γ v Γ 0 , dom(Γ) ` τ<sup>1</sup> v τ 0 <sup>1</sup> and dom(Γ) ` τ<sup>2</sup> v τ 0 2 . Then, Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N and Γ 0 ; ϕ ` τ 0 <sup>1</sup> . τ 0 2 N<sup>0</sup> implies Γ; ϕ ` N v N<sup>0</sup> .

Using the above properties, we can prove the following lemma which constitutes the core part of the proof of the gradual guarantee.

Lemma 5. Γ v Γ 0 , dom(Γ) ` M v M<sup>0</sup> and Γ; ϕ ` M N : τ imply Γ 0 ; ϕ ` M<sup>0</sup> N<sup>0</sup> : τ 0 , Γ; ϕ ` N v N<sup>0</sup> and dom(Γ) ` τ v τ 0 for some N<sup>0</sup> and τ 0 .

Finally, we can show the static and dynamic gradual guarantee as follows.

Theorem 2 (Static gradual guarantee). ∅ ` M<sup>1</sup> v M<sup>2</sup> and ` M<sup>1</sup> : τ<sup>1</sup> imply ` M<sup>2</sup> : τ<sup>2</sup> and ∅ ` τ<sup>1</sup> v τ<sup>2</sup> for some τ2.

Proof. This follows immediately from Lemma 5. ut

Theorem 3 (Dynamic gradual guarantee). Suppose ∅ ` M<sup>1</sup> v M<sup>2</sup> and ` M<sup>1</sup> N<sup>1</sup> : τ1. Then, there exist N<sup>2</sup> and τ<sup>2</sup> that satisfy all of the following.


Proof. By Lemma 5, ` M<sup>2</sup> N<sup>2</sup> : τ<sup>2</sup> holds for some N<sup>2</sup> and τ<sup>2</sup> where ` N<sup>1</sup> v N<sup>2</sup> and ` τ<sup>1</sup> v τ2. Also, from Lemma 2, we obtain ` N<sup>1</sup> : τ<sup>1</sup> and ` N<sup>2</sup> : τ2. Using Proposition 2, N<sup>1</sup> −→<sup>∗</sup> v<sup>1</sup> for some v<sup>1</sup> implies N<sup>2</sup> −→<sup>∗</sup> v<sup>2</sup> for some v<sup>2</sup> such that v<sup>1</sup> v v2. Also, N<sup>1</sup> −→<sup>∞</sup> implies N<sup>2</sup> −→<sup>∞</sup>. Using Proposition 3, N<sup>2</sup> −→<sup>∗</sup> v<sup>2</sup> for some v<sup>2</sup> implies N<sup>1</sup> −→<sup>∗</sup> v<sup>1</sup> for some v<sup>1</sup> such that v<sup>1</sup> v v2, or N<sup>1</sup> −→<sup>∗</sup> error. Also, N<sup>2</sup> −→<sup>∞</sup> implies N<sup>1</sup> −→<sup>∞</sup> or N<sup>1</sup> −→<sup>∗</sup> error. ut

# 4 Best-Effort Type Inference

Thanks to our combination of gradual typing and hybrid checking described in the previous sections, a type inference procedure need not necessarily output the most precise types. It is allowed to perform type inference only in a besteffort manner, and the results in the previous sections do not depend on the particular design of the type inference procedure. Nevertheless, it is desirable for the procedure to infer reasonably good types. In this section, we report a specific design of the type inference procedure, which we have implemented in our prototype system GraTen; as reported in the Section 5, our procedure works reasonably well for actual deep learning programs.

# 4.1 Overview of Type Inference and Checking in GraTen

The type checking in GraTen consists of the following three phases: (1) simple type inference, (2) best-effort refinement type inference, and (3) consistent subtyping checking and assertion insertion.

In the first phase, GraTen performs the simple type inference using the standard Hindley-Milner algorithm and annotates the AST with the inferred simple types of each node.

In the second phase, GraTen first collects all the consistent subtyping constraints of the form Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N from the source program. When it encounters AST nodes whose refinement type cannot be constructed directly, GraTen generates template refinement types using the simple types inferred in

the previous phase. Template refinement types may contain variables for undetermined predicates (referred to as predicate variables).

Using the collected constraints, GraTen then tries to find a solution for all of the predicate variables with its hand-made constraint solver. The constraint solving takes place on every let binding to allow let-polymorphism on shapes. We discuss the detail of the implementation of the solver in the next subsection, but at a high level, the solver tries to find such a solution that:


Given that the subtyping constraints can be expressed in the form of constrained Horn clauses (CHC) and not all the subtyping constraints need to hold, the problem above is essentially a CHC solving problem with weak constraints and maximality [22] where the optimization objective of the problem is defined by pointwise logical comparison of the solutions.

The constraint solver of GraTen does not always find a solution for all predicate variables. In such cases, GraTen assigns true to the undetermined predicate variables; that way, they will at least not invalidate the consistent subtyping constraints.

Note that GraTen does not take into account the consistent subtyping Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N itself when trying to find a solution, as we expect that it would be rare for a consistent subtyping Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N to hold when the subtyping relation Γ; ϕ ` τ<sup>1</sup> <: τ<sup>2</sup> does not hold. GraTen therefore defers the check of consistent subtyping constraints to the next phase.

In the third phase, GraTen checks the validity of consistent subtyping constraints using the inference results for the predicate variables from the previous phase. GraTen first attempts to simplify and verify the constraints by a handmade solver, but it falls back on using z3 [5] with timeouts if it does not work. Simultaneously, it also generates the assertion terms and inserts them into the source program.

#### 4.2 Heuristics of Best-Effort Type Inference

To solve the subtyping constraints explained above, we have implemented a hand-made constraint solver. GraTen does not use off-the-shelf SMT or CHC solvers such as Z3 [5], since the refinement predicates in GraTen often use complicated predicates on integer lists, for which standard SMT/CHC solvers cannot find a solution in a reasonable time. Also, while GraTen should infer general types (so as not to reject well-typed programs), those generic solvers are not biased towards generality and return any (non-general) solution that satisfies the constraints. This subsection describes the heuristics used in GraTen for constraint solving.

The preparation for the inference is already started when GraTen generates the template refinement types during the constraint collection. For each predicate variable generated, GraTen attaches the set of program variables it depends on, which is calculated from the type environment. This is used in the constraint solving later to avoid assigning irrelevant predicates to the predicate variables. We denote predicate variables as <sup>p</sup>xe(ye), where <sup>x</sup><sup>e</sup> denotes the set of program variables it depends on and <sup>y</sup><sup>e</sup> denotes the parameters of the predicate variable.

After collecting the constraints, GraTen decomposes the subtyping constraints to constrained Horn clauses of the form <sup>ϕ</sup>f<sup>1</sup> <sup>∧</sup> <sup>ϕ</sup>f<sup>2</sup> <sup>⇒</sup> <sup>ϕ</sup>f<sup>3</sup> following the definition of the subtyping relation (Figure 10). The notation <sup>ϕ</sup><sup>e</sup> denotes a set of predicates, logically interpreted as the conjunction of the predicates. The first, second, and third set of predicates in the clause respectively corresponds to the predicates from the context Φ(Γ) ∧ ϕ, the refinement of the type on the left ϕ1, and that of the type on the right <sup>ϕ</sup>2. We intentionally distinguish between <sup>ϕ</sup>f<sup>1</sup> and <sup>ϕ</sup>f<sup>2</sup> on the left-hand side of the clauses in describing the constraint solving algorithm. For example, let us reconsider the program in Figure 2. The subtyping constraints collected from the if expression of the program would be as follows, where p, q and r are the predicate variables generated for the type of s, x and the if expression respectively.

$$\begin{aligned} &\Gamma; (s=1) \vdash \{\nu \text{:tensor} \mid q\_{s,\nu}(\nu)\} <: \{\nu \text{:tensor} \mid r\_{s,x,\nu}(\nu)\} \\ &\Gamma; (s \neq 1) \vdash \{\nu \text{:tensor} \mid q\_{s,\nu}(\nu)\} <: \{\nu \text{:tensor} \mid \mathtt{len}(\nu \text{:shape}) = 1\} \\ &\Gamma; (s \neq 1) \vdash \textbf{tensor} ([\mathtt{nth}(0, x \text{:shape})/s]) <: \{\nu \text{:tensor} \mid r\_{s,x,\nu}(\nu)\} \\ &\text{where } \Gamma \coloneqq [s \mapsto \{\nu \text{:int} \mid p\_{\nu}(\nu)\}, x \mapsto \{\nu \text{:tensor} \mid q\_{s,\nu}(\nu)\}] \end{aligned}$$

These constraints are decomposed into the following clauses.

$$\{p\_s(s), q\_{s,x}(x), s=1\} \land \{q\_{s,\nu}(\nu)\} \Rightarrow r\_{s,x,\nu}(\nu)$$

$$\{p\_s(s), q\_{s,x}(x), s \neq 1\} \land \{q\_{s,\nu}(\nu)\} \Rightarrow \mathtt{len}(\nu.\mathtt{shape}) = 1 \qquad (1)$$
 $\{p\_s(s), q\_{s,x}(x), s \neq 1\} \land \{\nu.\mathtt{shape} = [\mathtt{nth}(0, x.\mathtt{shape})/s]\} \Rightarrow r\_{s,x,\nu}(\nu)$ 

From the clauses obtained as above, GraTen tries to find a solution for the predicate variables using an algorithm presented in Algorithm 1.

The algorithm processes the constraints by first trying to find a solution for predicate variables that occur on the right-hand side of a clause <sup>ϕ</sup>f<sup>1</sup> <sup>∧</sup> <sup>ϕ</sup>f<sup>2</sup> <sup>⇒</sup> <sup>ϕ</sup>f<sup>3</sup> (Line 6-10), and then on the left-hand side of a clause (Line 11-15), and repeats it until either all of the constraints are solved or the constraints cannot be processed any further (Line 4). In Line 8 and Line 13, the set of program variables <sup>x</sup><sup>e</sup> of a predicate variable <sup>p</sup><sup>x</sup><sup>e</sup> is used to assign the predicates to the predicate variables<sup>7</sup> .

During the iteration, the constraints need to be occasionally updated with the current solutions θ by applying the substitution θ to all the predicates in the constraints. After that, we also simplify the set of clauses (with simplify in Algorithm 1) by removing the predicates from the right-hand side of a clause that trivially follows from the left-hand side, and by removing clauses whose

<sup>7</sup> The set of program variables used in predicates is defined following the standard definition of free variables, except that the program variables used in a predicate variable <sup>p</sup><sup>x</sup><sup>e</sup> is defined as <sup>x</sup>e.

right-hand side is empty. For example, a clause {} ∧ {x = 1} ⇒ {x = 1} is simplified to {} ∧ {x = 1} ⇒ {}, and then removed from the set of clauses.

To illustrate the behavior of Algorithm 1, consider applying it to the clauses (1). During the first iteration of the while loop (Line 4), the first for loop (Line 6) exits with an empty θ as r appears on the right-hand side of multiple clauses and cannot be resolved here due to the check at Line 7. In the next for loop (Line 11), θ is updated to:

$$[q\_{s, \nu}(\nu) \mapsto \left(\mathtt{len}(\nu.\mathtt{shape}) = 1 \land q'\_{s, \nu}(\nu)\right)] \tag{2}$$

where q 0 s,ν(ν) is a fresh predicate variable, and the constraints c would be updated as follows.

{ps(s), len(x.shape) = 1, q 0 s,x(x), s = 1} ∧ {len(ν.shape) = 1 ∧ q 0 s,ν(ν)} ⇒ rs,x,ν(ν) {ps(s), len(x.shape) = 1, q 0 s,x(x), s 6= 1} ∧ {ν.shape = [nth(0, x.shape)/s]} ⇒ rs,x,ν(ν) The while loop exits after the second iteration, as no new predicate variables can be added to θ and c = c <sup>0</sup> holds. Thus, we only obtain (2) from Algorithm 1. After the inference, GraTen assigns true to the remaining predicate variables p, q <sup>0</sup> and r.

Algorithm 1 Algorithm for calculating the solutions θ to predicate variables from constrained Horn clauses c.


# 5 Experiment

This section reports on experiments to evaluate the effectiveness of our approach by running our tool GraTen for the example programs bundled in the OCamlTorch library [4]. We have also checked how type annotations changed the inference results.

#### 5.1 Methods

Input and Output of GraTen GraTen takes an OCaml program and performs type checking with its best-effort type inference. If the type checking is successful, it returns the inferred types of top-level variables defined in the program, and the source program with necessary assertions inserted. Otherwise, the type checking fails with an error message.

The assertions are inserted into the output program only when they are needed. Namely, assertions are inserted into the places where the consistent subtyping Γ; ϕ ` τ<sup>1</sup> . τ<sup>2</sup> N is used only when Γ; ϕ ` τ<sup>1</sup> <: τ<sup>2</sup> doesn't hold (see Proposition 1).

Besides the source program, GraTen also reads the types of the library functions (including those of OCaml-Torch) from manually prepared stub files. For example, the type of tr (matrix transpose function) is defined as follows.

```
val tr : x:{ v:tensor | len v.shape = 2 }
       -> tensor([nth 1 x.shape; nth 0 x.shape])
```
Note that describing the types of some higher-order OCaml-Torch functions requires the polymorphic extension, which we sketch in the full version [13]. For example, the type of Layer.forward is defined as follows.

```
∀b1:bool, b2:bool.
```
(x:{x:tensor | b1} → {y:tensor | b2}) → x:{x:tensor | b1} → {y:tensor | b2} GraTen handles such types by instantiating the quantified parameters (b<sup>1</sup> and b<sup>2</sup> in the above case) with fresh predicate variables.

Test Cases We applied GraTen to programs under examples/ directory of the repository of OCaml-Torch<sup>8</sup> . The list of programs tested is shown in Table 1. Since some programs use features of OCaml or OCaml-Torch that are not yet supported by GraTen, they were modified not to use such features without changing the structure of the neural network. Major modifications added to the target programs are listed below. Other smaller syntactic modifications can be found in the supplementary materials.


<sup>8</sup> https://github.com/LaurentMazare/ocaml-torch/tree/a6499811f4/examples

<sup>9</sup> Functions that take a tensor and return a tensor.

replaced with a variant Tensor.cat\_ which takes only two tensors. The other is Layer.sequential, which takes a list of layers and returns a layer that sequentially applies all the input layers.

(M3) Replacing mutable float objects with 0-dimensional tensors, as GraTen does not support reference types.

As an example of (M1) and (M2), consider the following function, which creates a list of linear layers and returns a new layer that applies all the layers in the list.

```
1 let f vs ~num_layers =
2 List.init num_layers ~f:(fun i -> Layer.linear vs ~input_dim:(i+1) (i+2))
3 |> Layer.sequential
```
The i-th layer in the list takes a tensor whose last dimension is size i+1, and returns a tensor of the same shape except that the last dimension is changed to i+2. By the modifications (M1) and (M2), the above function definition is replaced with:

```
1 let f vs ~num_layers =
2 let rec loop i xs =
3 if i = 0
4 then Layer.id xs
5 else loop (i-1) xs ~is_training |> Layer.linear vs ~input_dim:i (i+1)
6 in Layer.of_fn (loop num_layers)
```
Some programs in the examples/ directory are excluded from the test cases for the following reasons.

– neural\_transfer uses a library function Vgg.vgg16\_layers whose type cannot be described in GraTen; the relation between its inputs and its output tensor's shape could not be expressed in the syntax supported by GraTen.

– Programs dqn.ml, dqn\_atari.ml and dqn\_pong.ml in reinforcement-learning use queues which are not supported in GraTen yet.

– env\_gym\_pyml.ml and venv\_env\_gym\_pyml.ml under reinforcement-learning use Python objects whose verification is not the scope of this paper.

– reinforcement-learning/policy\_gradient.ml uses mutable lists which cannot be replaced with another datatype already supported in GraTen.

– yolo/darknet.ml and translation/lang.ml use hash tables which are not supported in GraTen yet.

– translation/dataset.ml and translation/lang.ml are irrelevant as tensor objects do not appear in them.

Evaluation We evaluated the best-effort inference of GraTen on the following three aspects.

First, we counted the assertions inserted into the original program when GraTen is used for the target program. Since the assertions indicate the program points that could fail at runtime, the user of GraTen would wish to pay attention to the location and the number of inserted assertions and try to decrease them.

Second, we counted the minimum number of type annotations required to type-check the program with minimum assertions inserted. This is for evaluating the realistic programmers' burden of trying to statically verify the program with type annotations. The annotations were added in such a way that the types of the functions do not lose the original generality. The type annotations are counted by the number of refinement types with non-true refinement predicates in them. For example, the following annotation counts as 3 because the refinement of the input tensor and the two output tensors are not true, but the refinement of the annotation of the second argument bool is true.

```
tensor([x]) -> ~is_training:bool -> tensor([x]) * tensor([x])
```
Third, we also measured the time taken by GraTen to analyze the unannotated and annotated programs. The experiments were conducted on a Linux machine with 12-core Intel i5-11400 (2.60GHz) and GraTen is implemented in Haskell with GHC version 9.0.2.

# 5.2 Experimental Results

Table 1 summarizes the experimental results. We analyze those results by the following three aspects: assertions, type annotations and analysis time.

Inserted Assertions Out of the 26 programs tested, 10 programs required no type annotations to type-check without assertions, and other 7 programs type-checked without assertions after adding appropriate type annotations. For the remaining 9 programs such as gan/began.ml and gan/gan\_stability.ml, we could not eliminate all assertions, although some of them were removed after adding type annotations. The remaining assertions were due to the imprecise type signatures of some library functions. For instance, Torch.Serialize.load is a function that loads a tensor from a file and its type signature is defined as follows.

```
val load : ~filename:string -> tensor
```
The return type of load is simply defined as tensor since it is impossible to assume any properties about its shape. As a result, an assertion was inserted to check if the loaded tensor satisfies the requirement to run the program without uncaught errors. Even adding type annotations to the loaded tensor does not remove the assertion.

Some other functions are given imprecise types due to GraTen's immature support of polymorphic data types. For example, the type of Tensor.stack is defined as follows because GraTen does not effectively support non-integer lists yet. Refining the return types of such functions is left as future work.

```
val stack : ~dim:int -> list (tensor) -> tensor
```


Table 1. Results of running GraTen to the test cases. The second column is the size of the program after the modification. The third and fourth columns are the results for unannotated programs. The third column is the duration of the type-checking and the fourth column is the number of assertions inserted. From the fifth to the seventh columns are for the annotated programs. The fifth column is the number of annotations added to the program.

Patterns of Added Type Annotations As we added type annotations to the test cases, we observed that the program points that require type annotations have similarities. All of the type annotations fall into one of the following patterns.


```
let rec loop
  : ~state:tensor([1; enc.hidden_size])
  -> ~prevs:list ({ v:tensor | prod v.shape = 1 })
  -> ~max_length:int -> list ({ v:tensor | prod v.shape = 1 })
= fun ~state ~prevs ~max_length -> ...
```
(P3) Higher-order shape-polymorphic arguments. For example, sample in char\_rnn.ml is annotated as follows.

```
let sample ~dataset ~lstm
  ~linear:(linear : x:{ v:tensor | last v.shape = hidden_size }
                 -> tensor(init x.shape @ [dataset.labels]))
  ~device = ...
```

let enc\_outputs : tensor([1; nth 1 v.shape; enc.hidden\_size]) = Tensor.stack enc\_outputs ~dim:1

The statically inferred type of enc\_outputs here is tensor([1; enc.hidden\_size]) list, so we would not need this type annotation if the type signature of Tensor.stack is appropriately defined. Since it is not possible to statically verify the correctness of these types of annotations, assertions would still be inserted after adding these annotations.

The first three patterns indicate that GraTen's current best-effort type inference does not effectively infer precise refinements for branches, recursive functions and higher-order shape-polymorphic arguments. The fourth pattern (P4) would be inevitable when using record types. It remains as future work to exempt users from having to add type annotations for (P5). With such improvements, we believe that it will become easier to find program points that require type annotations for better inference.

Number of Type Annotations There is no correlation between the number of assertions inserted into the unannotated program and the number of annotations needed to the program to minimize the number of assertions.

For example, adding two type annotations to gan/gan\_stability.ml resulted in removing 38 assertions. This is because GraTen inferred an imprecise type for a helper function resnet\_block without any type annotations, and it degraded the precision of the inference for the 24 callers of the function. Meanwhile, translation/seq2seq.ml required comparatively many type annotations as it has many definition of record types and several recursive functions with multiple inputs.

Analysis Time For all of the 11 annotated programs, GraTen's type checking for annotated programs was faster than the unannotated counterparts. This would be because having more static information made it easier for GraTen to infer more precise types and resolve more subsumption constraints easily.

# 5.3 Discussions

In this subsection, we discuss the strengths, weaknesses and our perspective on the future development of our system.

Performance of Best-Effort Inference As reported in the previous subsection, the best-effort inference of GraTen does not infer precise types for branches, recursions and higher-order shape-polymorphic arguments. While this may seem unsatisfying at a glance, the aim of this research is not to develop a perfect inference algorithm, but to propose a method that can work on unannotated programs and allows users to work interactively with the type checker to gradually add type annotations. With this respect, we believe that GraTen has achieved desirable results since it will be easy for the user to find out where to add type annotations. This is because (1) the inserted assertions can inform the user of the location of potential dynamic errors, and (2) all of the required type annotations would fall into one of the patterns listed in the previous section and thus should be predictable.

Lists of Tensors and Layers As of now, the refinement inference for lists in GraTen is limited to integer lists. Meanwhile, lists of tensors or lists of functions are commonly used in deep learning programs: Tensor.cat and Tensor.stack both take a list of tensors and return their concatenation, and Layer.sequential takes a list of layers (functions that take and return a tensor) and returns their composition.

A potential approach to support these library functions would be to add new refinement predicates for tensors lists or layer lists. For example, we can add a predicate composable(x, S1, S2) which means that the composition of a list of layers x takes a tensor of shape S<sup>1</sup> and returns a tensor of shape S2. The type of Layer.sequential would be expressed with the shape polymorphic extension (see the full version [13]) as follows.

```
val sequential : forall S1 S2.
  { v:list(tensor -> tensor) | composable(x,S1,S2) }
                                   -> tensor(S1) -> tensor(S2)
```
To practically infer composable predicate for layer lists, we would need to change the type-instantiated versions of list-manipulating functions as well. For instance, the type of the cons function for layers would need to be defined as follows.

```
val cons_layers
  : forall S1 S2 S3. (tensor(S1) -> tensor(S2))
  -> { v:list(tensor -> tensor) | composable(v, S2, S3) }
  -> { v:list(tensor -> tensor) | composable(v, S1, S3) }
```
Reporting Incorrect Type Annotations Since our type system sees the standard refinement types as gradual, some users might find the behavior of GraTen unexpected in some cases. Consider the following function f which takes a matrix and returns a matrix obtained by transposing the input. Suppose that the programmer mistakenly annotated the return value of f to have the same shape as the input matrix.

let f x = (tr x : tensor(x.shape))

Although this type annotation does not hold in general, this program is not rejected by our type system because the annotation can hold if the input x is a square matrix. GraTen would output the following program with an assertion.

```
let f x = (fun y -> assert(y.shape = x.shape); y) (tr x)
```
To avoid such a situation, it would be possible to extend the type system with types with fully statically known refinements, and let the annotated types be interpreted as such.

# 6 Related Work

Tensor Shape Checking in Deep Learning Programs. The problem of tensor shape checking has been studied for decades by various contexts such as the numeric analysis [7,2] and the array-oriented languages with rank polymorphism [29,28,12]. Tensor shape checking for deep learning programs is still a new challenge because the shapes can be more complicated, and a variety of methods have been proposed both in academia and in industry.

Some tools statically check tensor shapes with advanced type systems. Hasktorch [3] is a Haskell binding of libtorch [20] which provides a mode that statically checks tensor shapes. Since they use the type-level programming feature of Haskell to implement the tensor shapes, tensor shapes are not first-class objects. As a result, programs such as the one in Figure 1 cannot be expressed since it is impossible to define the function f whose type depends on the first-class object s. Relay [25,24] is an IR for deep learning compilers with a rich type system for tensor shape with type inference. Both Relay and Hasktorch support dynamic shape as a wild card in the static shape checking.

Apart from the type-based verification methods, some tensor shape error detection tools also take a static approach. Pythia [17,6] statically detects shape fault for TensorFlow [1] programs by keeping track of the tensor shapes throughout the program using value-flow analysis. The tracking of shape is in a besteffort manner, allowing the shape inference results to be "unknown" in some cases. The analysis crucially relies on the programming practice in TensorFlow to annotate tensor shapes as much as possible.

Other static checking tools took an approach that uses symbolic execution to collect constraints from the program and verifies it with a solver; Tensors Fitting Perfectly [21] and PyTea [15] are on this approach. Both methods remove loops from the program in an ad-hoc manner based on a reasonable assumption for the program.

Lastly, some took dynamic approaches to provide lightweight shape fault detection. ShapeFlow [31] is an abstract interpreter of TensorFlow programs; it shares the same APIs as TensorFlow but only calculates the shape of tensors. Users can run the analysis by replacing the import of TensorFlow with Shape-Flow in the target program, which executes more efficiently than the original TensorFlow program. Elichika [14] uses a similar method to ShapeFlow with a feature to display the interpreted shapes with a symbolic expression. These dynamic approaches enable quick analysis and require no type annotations, but provide no guarantee for untested inputs.

Static and Dynamic Checking for Refinement Types. Earlier work on dependent type system focused on decidable type checking and inference with restricted refinement logic [10,34,33,26]. Dynamic checking with contracts [19,9] offers expressive verification that cannot be covered with a static type system, but at a cost of runtime overhead. Naturally, the combination of static and dynamic checking has been actively explored by the successors of both parties.

Hybrid type checking [16], which our work is based on, extends the purelydynamic method of using contracts by verifying specifications statically as much as possible. This method differs from ours in that it inserts a dynamic check only when the subtyping constraint is not proven to be valid or invalid. As a result, this method statically rejects the incorrectly annotated program that we discussed in Subsection 5.3, while our method accepts it with a dynamic check in the hope that a more precise type annotation will remove the need for a dynamic check. Our method can be understood as a variant of hybrid type checking with a focus on being gradual in adding type annotations.

The application of gradual typing to dependent type systems has also been studied [18,8]. Especially, gradual refinement types [18] is very similar to our type system in that it gradualizes only the predicate part of a refinement type system and the underlying simple type is static. One of the differences is that their system distinguishes statically-unknown refinement predicates with staticallyknown ones, while our system assumes that any refinement predicates can have a statically-unknown portion. For example, consider the following program:

$$\mathbf{1et} \; f \; x \left( y : \{ \nu : \mathbf{int} \mid \mathbf{true} \} \right) = x/y$$

This program is rejected in their system because the type annotation of y indicates that the programmer is confident that y can be any integers including 0; otherwise, the type annotation should have been {ν : int | ? }. Meanwhile, our system interprets the type annotation as not precise enough and accepts the program by inserting a dynamic check to y. Intuitively, {x : B | ϕ} in our type system translates to {x : B | ϕ ∧ ?} in gradual refinement types [18].

The type inference for gradual refinement types has been studied by Vazou et al. [30]. Their work restricts the refinement to liquid predicates [26] to maintain the decidability, while our work does not impose such a limitation.

# 7 Conclusion and Future Work

We presented an extension to the standard refinement type system which can be viewed as a gradual type system. The essence of this extension is the introduction of the consistent subtyping relation, which inserts to the source program assertions that checks statically-unverified properties at runtime. We also presented that the extended type system satisfies the refined criteria of gradual typing.

We then applied this type system for verifying tensor shapes with best-effort type inference. This application makes use of the property of the proposed type system that allows us to cover the limitation of the static best-effort analysis with dynamic checks. We also implemented a prototype type checker GraTen and applied it with some of the example programs publicly available in OCaml-Torch repository. We observed that, thanks to the best-effort type inference, users would not be required too many type annotations to statically type-check the whole program, and it would not be difficult to find where to add type annotations to improve the inference.

We conclude with some ideas for future work.

– Extension with type polymorphism. As we observed in the experiments, type polymorphic functions are frequently used in realistic programs. Extending our type system with ML-style type polymorphism would make the type checker more practical.

– Application for imperative languages with a dynamic type system, like Python. In this paper, we have chosen OCaml as the target of the prototype to ensure that the input program is statically-typed. Python would, however, be a more attractive target since it is widely used in the machine learning community.

Acknowledgments We would like to thank anonymous referees for useful comments. This work was supported by JSPS KAKENHI Grant Number JP20H05703.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# A Type System for Effect Handlers and Dynamic Labels

Paulo Emílio de Vilhena() and François Pottier

Inria, Paris, France {paulo-emilio.de-vilhena,francois.pottier}@inria.fr

Abstract. We consider a simple yet expressive λ-calculus equipped with references, effect handlers, and dynamic allocation of effect labels, and whose operational semantics does not involve coercions or rely on type information. We equip this language with a type system that supports type and effect polymorphism, allows reordering row entries and extending a row with new entries, and supports (but is not restricted to) lexically scoped handlers. This requires addressing the issue of potential aliasing between effect names. Our original solution is to interpret a row not only as a permission to perform certain effects but also as a disjointness requirement bearing on effect names. The type system guarantees strong type soundness: a well-typed program cannot crash or perform an unhandled effect. We prove this fact by encoding the type system into a novel Separation Logic for effect handlers, which we build on top of Iris. Our results are formalized in Coq.

# 1 Introduction

Effect handlers [30,17] can be viewed as a generalization of exception handlers. Like raising an exception, performing an effect interrupts the normal flow of execution and transfers control to a handler. Unlike an exception handler, an effect handler gains access to a delimited continuation, which represents the fragment of the evaluation context comprised between the point where the effect was performed and the point where the effect handler was installed. Invoking this continuation resumes the computation whose execution was suspended by performing an effect.

To allow programmers to exploit several independent effects simultaneously, it is desirable for effects to have names. Each effect handler handles a specific name, or a specific set of names. When an effect is performed, the name of this effect determines which handler is selected. This idea immediately gives rise to several key questions about names. What are they: strings, variables, addresses? Where are they defined? What is their scope?

In the simplest approach [2,14,22], effect names are global. All possible names are predefined and are in scope everywhere. This approach is simple but unsatisfactory in terms of expressiveness and modularity: an accidental collision, where two unrelated pieces of code happen to use the same effect name, can have surprising unintended consequences. We illustrate this problem later on (§2).

To remedy this problem, several authors have proposed to change the nature of names. Their work falls broadly in two categories: the "lexical approach" and the "generative approach".

The "lexical approach" introduces local effect names with lexical scope. One can then think of an effect name essentially as a variable. Tunneled exceptions [42] and lexically scoped handlers [41,6,7,27] fall in this approach. In some of these proposals, the local effect name is never exposed to the user, but a "capability" to perform the effect is made available via a local variable. A potential pain point of this approach is that one must somehow ensure that a name or capability cannot escape its scope: this must be guaranteed by some combination of syntactic restrictions, runtime tests, and static typing rules.

The "generative approach" consists in allowing new effects to be generated afresh at runtime. This requires introducing a distinction between effect labels, which are allocated at runtime, and effect names, which are variables (with lexical scope) that the programmer uses to refer to effect labels. This is similar to the distinction between memory locations and variables that is traditionally used in the operational semantics of mutable references [29]. This approach has long been in use for exceptions in Standard ML [25] and OCaml [24], and is used also for effects in OCaml 5. It is powerful: in particular, it can simulate lexically scoped handlers.<sup>1</sup> However, it introduces several pitfalls of its own. First, it creates the possibility of nameless effects, that is, the possibility that there is no static effect name for a certain effect label. Second, it introduces the possibility of aliasing between effect names, that is, the possibility that two distinct effect names denote the same effect label. Aliasing creates a challenge for type system designers: if one cannot statically tell whether two effect names denote distinct labels, then it seems unclear how one can propose a sound and precise type discipline.

At least three ways of evading or addressing this challenge appear in the literature.

First, several mainstream languages adopt the generative approach but avoid the aliasing challenge by offering a weak type soundness guarantee: a well-typed program cannot crash, but can halt due to an unhandled exception or effect. This is the case in Standard ML, where exceptions are untracked, and in OCaml, where exceptions and effects are untracked. It is also the case in Eff [3].

Second, a number of authors evade or resolve the aliasing challenge by altering the syntax and the operational semantics of the language. Instead of letting the correspondence between an effect and a handler be determined purely by the notion of equality of effect labels or effect names, they introduce coercions

<sup>1</sup> This can be a source of confusion. A language that has "lexically scoped handlers" can, technically, be presented in either of these two styles. Biernacki et al. [6] present one semantics in each style, the "open semantics" and the "generative semantics", and prove an equivalence between them. Zhang and Myers [41] adopt what we believe is a combination of lexically scoped handlers and implicit arguments, which they refer to as "tunneling", in their surface language. This language is then translated down to a core language whose operational semantics is in the generative style.

that enable explicit disambiguation and collision avoidance. Examples include Koka [21] as well as several papers by Biernacki et al. [4,5].

Third, some authors evade the challenge by restricting the programming language in one or more ways, such as restricting attention to lexically scoped handlers [6,7] and forbidding first-class functions [7].

This sets the scene for this paper. We stick with the generative approach, which offers a simple and expressive semantics. We do not introduce coercions or otherwise alter the operational semantics. We do not restrict our attention to lexically scoped handlers. We address the aliasing challenge.

We propose Tes, a type-and-effect system that statically rules out unhandled effects. As in most previous work, the potential effects of an expression are described by a row, a concept introduced to type-check records and variants [32,38] and later applied to the analysis of exceptions [28] and effects [14,22]. Type and effect polymorphism are supported. Furthermore, a simple and powerful subsumption relation allows reordering the entries in a row and extending a row with new entries, without any side conditions.

How is this possible? How is the aliasing challenge addressed? Our key idea is this: whenever a question about aliasing arises, require absence of aliasing. In other words, we interpret a row not just as a description of the names and types of the effects that may be performed, but also as a requirement that these names be pairwise distinct. For instance, if a typing judgment states that an expression e has effect (s : ι ⇒ κ) · (s 0 : ι <sup>0</sup> ⇒ κ 0 ), then this means not only that e may perform the effects s and s 0 , but also that e requires the effect labels denoted by s and s 0 to be distinct. In the presence of effect polymorphism, if e has effect (s : ι ⇒ κ) · θ, where θ is a row variable, then we take this to mean that e requires the effect label denoted by s to lie outside the set of effect labels denoted by θ. We adapt our typing and subtyping rules, where needed, so as to be sound with respect to this new interpretation of rows.

The reader may find our approach somewhat reminiscent of the manner in which the separating conjunction of Separation Logic [31] requires disjointness between the footprints of two formulae. Although this requirement may at first seem strong, experience has shown that Separation Logic is in fact concise and expressive. The examples that we present in Section 4.4 seem to suggest that our disjointness requirement is acceptable; we have not yet found examples where it is problematic. That said, we do not yet have practical experience with an implementation of this type system.

Tes offers a strong type soundness guarantee: a well-typed program cannot crash and cannot halt due to an unhandled effect. To prove this fact, we follow a semantic approach that has become popular in the last few years [1,20,19]. We introduce TesLogic, a novel variant of Separation Logic, constructed on top of Iris [16], which allows reasoning about programs in the presence of effects and handlers, multi-shot continuations, and dynamic allocation of effect labels. We prove that this logic is sound, and we provide an interpretation of Tes's typing rules in terms of TesLogic's reasoning rules. All of our results are formalized in Coq, and our Coq formalization is available [36].

In summary, the main contributions of this paper are the design of Tes, a type system for TesLang, a λ-calculus equipped with general references, effect handlers, and dynamic allocation of effect labels, and a proof of type soundness, which is carried out via a semantic interpretation into a new program logic, TesLogic.

In Section 2, we provide more background and examples about the semantics of effect handling: we discuss name collisions, effect coercions, lexically scoped handlers, and dynamic allocation of effect labels, and we justify why we wish to study a calculus where effect handling and dynamic allocation of effect labels are separate constructs. In Section 3, we present the syntax and operational semantics of TesLang. In Section 4, we introduce Tes and show a number of examples of constructions that Tes is able to type-check. In Section 5, we present a brief overview of the proof of type soundness. Finally, we discuss the related work and conclude.

# 2 A Panorama of Semantics for Effect Handlers

The various mechanisms that we have mentioned so far, namely lexically scoped handlers, dynamic allocation of effect labels, and effect coercions, aim to resolve the basic problem of accidental collisions between effect names. Let us illustrate this problem with an example.

Anticipating on Section 3, we use a λ-calculus equipped with constructs to perform and handle effects. The expression perform s v performs an effect with effect name s and payload v. The expression handle e with s : h | r installs an effect handler which monitors the execution of the subexpression e and which handles the effects that carry the name s. 2 If e returns a value v, then the return branch r is invoked and receives the value v as an argument. If e performs an effect with name s and with payload v, then the execution of e is suspended and control is transferred to the effect branch h, which receives the payload v and a continuation k representing the suspended computation.

Let us now introduce the function bad\_counter. In a system of simple types, which does not keep track of effects, bad\_counter expects a function ff of type (α → β) → γ and returns a function of type (α → β) → γ × int. The intended behavior of bad\_counter ff is to produce a new function ff 0 such that ff 0 behaves like ff but at the same time counts how many times ff uses its argument. That is, for an arbitrary function f, the application ff 0 f is expected to return a pair (v, n), where v is the result of the computation ff f and n is the number of invocations of f that have taken place during this computation. The function bad\_counter is defined as follows:

$$\mathsf{bad\\_counter\\_\#} = \lambda f. \begin{pmatrix} \mathsf{hand1e\\_f} \ (\lambda x.\mathsf{perf\{orm\\_tick\\_()}; f\ x) \ \mathsf{with} \\ \mathsf{tick\\_i\\_k.\lambda n.k\\_()} \ (n+1) \mid \lambda y.\lambda n.\\_(y,n) \end{pmatrix} \mathsf{0}$$

This code has a free effect name, tick. The function f is wrapped in a proxy which performs an effect named tick. This effect is handled by bad\_counter; the

<sup>2</sup> For simplicity, this construct selects just one name, as opposed to a set of names.

handler implements a memory cell (in state-passing style) to count the number of ticks, that is, the number of calls made by ff to f.

Unfortunately, because this function uses a fixed effect name, tick, it can exhibit an unintended behavior, caused by an accidental collision of effect names. The following use of bad\_counter exhibits this issue:

$$\mathsf{bad\\_counter} \ (\mathsf{bad\\_counter} \ (\lambda f. f. ())) \ (\lambda\\_\text{.} ())$$

Because the function λf. f () calls its argument once, one might expect the above expression to return (((), 1), 1). Its actual result, however, is (((), 2), 0). In the interest of space, we omit an explanation of its operational behavior. The key reason why it behaves incorrectly is that the two instances of bad\_counter use the same effect name. Each application of bad\_counter installs a handler for the effect name tick. One handler is nested inside the other. As a result, the innermost handler intercepts two tick effects and the outermost handler never observes any effect, whereas what was naively intended was that each handler observes and handles one effect. As a result of the name collision, one of the effects is accidentally handled by the innermost handler.

To avoid or help avoid accidental collisions between names, the literature describes several mechanisms: (1) effect coercions, (2) lexically scoped handlers, which can be viewed as a restricted case of (3) dynamic allocation of effect labels. Let us now say a little more about these mechanisms.

Effect coercions. An effect coercion modifies the manner in which an effect is matched with one of the enclosing handlers. Perhaps the simplest example is that of the lift coercion [4,5], but there are other forms of coercions in the literature, such as swap. Normally, performing an effect named s transfers control to the innermost enclosing handler that selects the name s. However, in a language with effect coercions, if there is a lift coercion between the point where the effect is performed and the innermost enclosing handler, then this handler is skipped and control is transferred instead to the next enclosing handler for the name s. <sup>3</sup> Under such a semantics, a coercion can be employed to write a fixed version of bad\_counter:

$$\begin{array}{l} \mathsf{hift\\_counter\ f} =\\ \lambda f. \left( \mathsf{handle\ f} \left( \lambda x. \mathsf{perf\{orm\ rich\ ();\ 1\)\dagger} \mathsf{t} \mathsf{t} \mathsf{t} \, \mathsf{t} \, \mathsf{t} \, \mathsf{t} \, \mathsf{t} \, \mathsf{t} \, \mathsf{t} \right) \, \mathsf{w} \mathsf{t} \right) \\ \mathsf{t} \, k. \left( \begin{array}{l} \mathsf{hond} \mathsf{e} \, \mathsf{f} \, \left( \lambda x. \mathsf{p} \mathsf{r} \mathsf{t} \mathsf{t} \, \mathsf{m} \, \mathsf{t} \, \mathsf{t} \, \mathsf{m} \, \mathsf{t} \, \mathsf{m} \, \mathsf{t} \, \mathsf{m} \, \mathsf{m} \, \mathsf{t} \, \mathsf{p} \, \mathsf{m} \, \mathsf{m} \, \mathsf{p} \, \mathsf{m} \right) \, \mathsf{0} \\ \end{array} \right) \, 0 \end{array}$$

As desired, lift\_counter (lift\_counter (λf. f ())) (λ\_. ()) returns the value (((), 1), 1). One tick effect is intercepted by the innermost handler; the other effect is intercepted by the outermost handler thanks to the lift coercion. In Biernacki et al.'s λ HEL [5], lift\_counter is well-typed. The lift coercion is mandatory; without it, the code would be ill-typed.

<sup>3</sup> A lift coercion behaves like an end-of-scope marker for the name s. This concept has been studied, independently of effects, by various authors [13,10].

Lexically scoped handlers and dynamic allocation of effect labels. Perhaps the most straightforward way to describe the operational behavior of lexically scoped handlers is by means of their encoding in terms of ordinary effect handlers and dynamic generation of effect labels. So, let us first extend our calculus with dynamic allocation of effect labels. We introduce the construct effect s in e, which binds the effect name s to a freshly generated effect label, then executes e. The effect name s is a local variable: its scope is the subexpression e. An effect label is a runtime entity; later in the paper, we let ` range over effect labels. In this setting, a "lexically scoped handler" is encoded (simulated) as follows:

$$\begin{array}{l} \mathsf{lex}\text{-}\mathsf{handle}\ e\ \mathsf{with}\ h\ \mid r = \\ \mathsf{effect}\ s\ \mathsf{in}\ \mathsf{handle}\ e\ \left(\lambda x.\mathsf{perf}\ \mathsf{or}\ s\ x\right)\ \mathsf{with}\ s:\ h\ \mid r \end{array} \tag{1}$$

This code first generates a fresh effect label, denoted by the name s. Then, it installs a handler for the name s. This handler monitors the execution of the expression e to the anonymous function λx. perform s x, which can be viewed as a "capability" to perform the effect s.

A noteworthy aspect of the syntactic sugar lex-handle e with h | r is that it does not explicitly involve any effect name. This construct is known as a "lexically scoped handler".

A lexically scoped handler can be used to write a fixed version of bad\_counter:

$$\text{counter } \mathcal{F} = \lambda f. \begin{pmatrix} \text{1ex-handle } \lambda \,tick. \mathcal{F} \, (\lambda x. \,tick \, (); f. x) \, \text{with} \\\text{ $\lambda$  }\_{-} k. \lambda n. k \; () \, (n+1) \mid \lambda y. \lambda n. \, (y, n) \end{pmatrix} \; \text{(2)}$$

When lex-handle is executed, a fresh effect label (which is never explicitly mentioned in this code) is generated. The variable tick stands for the "capability" to perform this fresh nameless effect. One can check that the expression counter (counter (λf. f ())) (λ\_. ()) reduces to the value (((), 1), 1), as desired, because the two instances of counter generate two distinct dynamic labels and install one handler for each of these labels. Thus, no collision takes place.

Arguments in favor of dynamic allocation of effect labels. In summary, dynamic allocation of effect labels is a way of avoiding collisions between effect names. It can express lexically scoped handlers, but does not impose the use of lexically scoped handlers: it also allows working with global names when desired. Its dynamic semantics is simple. It is in use in several established programming languages, such as Standard ML and OCaml.

We believe that lexically scoped handlers are an elegant idiom, which is well suited to many but not all situations. So, we would not be satisfied with a restricted programming language where lexically scoped handlers are the sole form of effect handling. Indeed, lexically scoped handlers impose a somewhat unnatural "capability-passing" style, where the capability to perform an effect must be passed as an argument to a function (or captured in its closure). This style becomes especially cumbersome when multiple effects are involved. Implicit arguments can help, as suggested by Zhang and Myers [41] and by Odersky et al. [27]. However, elaboration of implicit arguments is usually a type-directed

n ::= s | ` v ::= () | ` | rec f x. e | §K e ::= v | x | e e | ref e | ! e | e := e | effect s in e | perform n e | handle e with n : v | v | eff ` v K K ::= • | e K | K v | ref K | ! K | e := K | K := v | perform ` K | handle K with ` : v | v

Fig. 1. Syntax of effect values, values, expressions, and evaluation contexts

```
effect s in e / σ → e[`/s] / σ[` 7→ ()]
                      perform ` v / σ → eff ` v • / σ
           handle v with ` : h | r / σ → r v / σ
handle (eff ` v K) with ` : h | r / σ → h v §(handle K with ` : h | r) / σ
                            §K v / σ → K[v] / σ
                  (eff ` v1 K) v2 / σ → eff ` v1 (K v2) / σ
                  e1 (eff ` v2 K) / σ → eff ` v2 (e1 K) / σ
handle (eff ` v K) with `
                          0
                           : h | r / σ → eff ` v (handle K with `
                                                                  0
                                                                    : h | r) / σ
```
Fig. 2. The head reduction relation (selected rules)

translation. If at all possible, we wish to preserve the "type erasure" property: that is, we prefer a language whose operational semantics is not influenced by type information, because such a semantics is easier to explain to an end user. Similarly, we wish to avoid effect coercions because we believe that they introduce unwarranted complexity, making the language and its dynamic semantics more difficult to explain to programmers.

# 3 Syntax and Semantics

We introduce TesLang, a calculus with mutable state, effect handlers, multiple named effects, dynamic allocation of effect labels, and multi-shot continuations. The operational semantics of this calculus allows a continuation to be invoked several times. With respect to this semantics, the type system presented in this paper (§4) is strongly sound: it rules out all runtime errors (§5). With respect to a dynamic semantics where invoking a continuation twice causes a runtime failure, such as the semantics of OCaml 5, our type system would be weakly sound, because it does not rule out this kind of runtime failure. Ensuring that every continuation is invoked at most once would require an affine type system and is beyond the scope of this paper. We note that an affine program logic, such as Hazel [35], can guarantee that no continuation is invoked twice, therefore can guarantee strong soundness even in the presence of one-shot continuations.

Our small-step operational semantics is very straightforward. It is equipped with dynamic allocation of effect labels and with a standard treatment of effects and effect handlers [2]. When an effect with label ` is performed, a dynamic lookup takes place: the nearest enclosing handler that is able to handle the label ` is selected. This is expressed, in small-step style, via several reduction rules. In contrast with some papers in the literature, where coercions influence the process of selecting a handler [21,4,5], here, this process is based purely on equality of effect labels.

#### 3.1 Syntax

We let f and x range over an infinite set of variables. We let s range over an infinite set of variables, and we refer to these variables as effect names. These two namespaces are independent of one another: an effect name cannot be passed as a parameter to a function. We let ` range over an infinite set of addresses. These addresses model both memory locations and effect labels. Both kinds of entities are dynamically allocated, so, for simplicity, we use a single namespace of addresses and a single store. Whereas variables f, x and effect names s can appear in source programs, memory locations and effect labels ` exist only at runtime. The reduction rules of the small-step semantics cause them to appear.

The syntax of effect values, values, expressions, and evaluation contexts is shown in Figure 1.

An effect value n is either an effect name s or an effect label `. This syntactic category is closed under substitutions of effect labels for effect names. It is used in the constructs perform n e and handle e with n : v | v. A programmer always writes perform s e and handle e with s : v | v, where s is an effect name, but the more general form is required in the operational semantics.

A value v is the unit value (), a memory location `, a possibly recursive function rec f x. e, or a continuation §K.

The syntax of expressions e includes values, variables, function application, operations for allocating, reading, and writing references, as well as constructs for allocating a fresh effect label, performing an effect, and handling an effect. Sequencing is encoded as function application: let x = e<sup>1</sup> in e<sup>2</sup> is sugar for (λx. e2) e1. The construct effect s in e dynamically allocates a new effect label and binds the effect name s to this label in the expression e. The construct perform s v performs an effect whose name is s and whose payload is the value v. The construct handle e with s : h | r monitors the execution of the expression e. If an effect named s is performed, then the effect branch h takes control. If a value is returned, then the return branch r takes control. An effect that carries a name other than s is propagated up through this construct. Finally, the construct eff ` v K, an active effect, does not appear in source program, but plays a role in the operational semantics, as we shall explain in the next subsection.

Our Coq formalization [36] covers a richer calculus, whose features include base types, pairs, sums, and lists.

The syntax of evaluation contexts K defines a right-to-left evaluation order. This choice is arbitrary: it is inspired by Iris's HeapLang language [33], but our results would hold also with left-to-right evaluation.

#### 3.2 Semantics

The operational semantics of TesLang involves two relations, namely the head reduction relation e / σ → e <sup>0</sup> / σ<sup>0</sup> and the reduction relation e / σ −→ e <sup>0</sup> / σ<sup>0</sup> . They act on configurations, where a configuration e / σ is a pair of an expression e and a store σ. The head reduction relation, a fragment of whose definition appears in Figure 2, is the most interesting relation. The reduction relation, whose definition is omitted, allows one step of head reduction to take place under an evaluation context.

A store is a finite map of addresses to values. We use addresses ` to denote both memory locations and effect labels. If ` denotes a memory location (that is, the address of a reference), then σ(`) is the value stored at this address. If ` denotes an effect label, then the value σ(`) is irrelevant: by convention, we use the unit value ().

The rules not shown in Figure 2, such as βv-reduction and the rules for allocating, reading, and writing references, are standard.

The first rule in Figure 2 states that effect s in e allocates a fresh address `, extends the store with a mapping of ` to the unit value, and substitutes the effect label ` for the effect name s in the expression e. (The rule has the side condition ` /∈ dom σ.) According to the second reduction rule, perform ` v reduces to an active effect eff ` v •. An active effect has the ability to capture the surrounding evaluation context, until it reaches a handler that is able to handle it. In this rule, it is initialized with an empty evaluation context •. The last three rules in Figure 2 show how an active effect captures its evaluation context, one frame at a time. (The last rule has the side condition ` 6= ` 0 .) The third and fourth rules in Figure 2 show how the return branch or the effect branch of a handle construct are taken. In the latter rule, the handler h is applied to the payload value v and to a continuation, which reifies the captured evaluation context K. The continuation contains a copy of the effect handler: this is a deep-handler semantics [15]. The fifth reduction rule in Figure 2 describes the application of a continuation §K to a value v.

# 4 Type System

#### 4.1 Syntax of types, rows, and signatures

We let α, β, and γ range over an infinite set of type variables. We let θ range over an infinite set of row variables. We distinguish three syntactic categories, namely types, rows, and signatures (Figure 3). The syntax of types is stable under substitutions of types τ for type variables α. The syntax of rows is stable under substitutions of rows ρ for row variables θ, for an ad hoc notion of substitution, which reduces row concatenation expressions "ρ · ρ 0 " on the fly.<sup>4</sup>

<sup>4</sup> The distinction between rows and signatures enforces the view that a row ρ is a list where each component (known as a "signature") is either a signature for an effect name s or a row variable θ. Thus, we impose a simple form on rows. As an alternate

$$\begin{array}{lcl}\tau,\,\kappa,\,\iota ::= \mathtt{un}\,\mathsf{it}\,\mid\,\perp \mid\,\top \mid \alpha \mid \tau \,\mathsf{ref}\,\mid\,\tau \stackrel{\rho}{\to} \tau \mid \forall \alpha.\,\,\tau \mid \forall \theta.\,\,\tau\\\rho ::= \langle\rangle \mid \sigma \cdot \rho\\\sigma ::= \left(s : \tau \Rightarrow \tau\right) \mid \theta\end{array}$$


Fig. 4. The type system (selected rules)

Our types are standard: they include the unit type unit, the bottom and top types ⊥ and >, type variables α, reference types, effect-annotated arrow types, value-polymorphic types, and effect-polymorphic types. Effect-annotated arrow types and effect-polymorphic types are discussed below.

A row is a list of signatures σ. A signature, in turn, is either a singleton signature s : ι <sup>0</sup> ⇒ κ <sup>0</sup> or a row variable θ. A singleton signature s : ι <sup>0</sup> ⇒ κ <sup>0</sup> means that performing the effect s is permitted and is analogous to calling a function of argument type ι <sup>0</sup> and return type κ 0 . According to this reading, a singleton signature of the form s : ⊥ ⇒ > actually forbids the effect s, because a function whose argument type is ⊥ can never be called. We write s : abs as a short-hand for this signature, and we refer to it as an absence signature for the effect s.

In addition to an argument type τ and a return type κ, an arrow type τ ρ −→ κ carries an "effect", that is, a row ρ. Intuitively, a value of type τ ρ −→ κ is a function, which, when applied to an argument of type τ , either returns a result of type κ or performs an effect that is permitted by the row ρ. On top of this standard reading of effect annotations, Tes introduces a novel aspect. The effect annotation ρ is interpreted not only as a set of permitted effects, but also as a precondition: we impose the semantic requirement that a function of type τ ρ −→ κ can be invoked only if the multiset of effect labels denoted by the row ρ has no duplicate elements. This is not a syntactic requirement, which would be either "true" or "false" and would be decided just by inspecting the syntax of the row ρ. Indeed, in general, a row contains occurrences of effect names s, which denote a-priori-unknown effect labels, and of row variables θ, which denote a-priori-unknown multisets of effect labels. What we wish to require is that, at runtime, after effect names and row variables have been substituted away by some substitution η, a function of type τ ρ −→ κ can be invoked only if no effect label appears twice in the closed row η(ρ). Thus, the requirement that "ρ contains no duplicate labels" should be thought of as a disjointness hypothesis bearing on the row ρ. Such a hypothesis may or may not be satisfied, depending on how the effect names and row variables that occur in ρ are instantiated.

In Tes, disjointness hypotheses are sometimes explicit and most of the time implicit. In the subsumption judgments (Figure 5), a disjointness context D is explicit: it can be interpreted as a conjunction of disjointness hypotheses. In function types τ ρ −→ κ and in typing judgments Ξ | ∆ | Γ ` e : ρ : τ , an implicit disjointness hypothesis bearing on the row ρ is built in, so there is no need for an explicit disjointness context.

An effect-polymorphic type ∀θ. τ involves a universal quantification over a row variable θ. For instance, the function iter, which iterates over a list, can be defined as follows:

$$\mathbf{i}\,\mathbf{i}\mathbf{t}\mathbf{r} = \mathbf{r}\mathbf{c}\,\,iter\,\,xs\,f.\,\mathbf{nat}\,\mathbf{t}\,\,xs\,\,\mathbf{with}\,\,\left(\lambda x\,\,xs.\,f\,\,x\,,\,iter\,\,xs\,\,f\mid\lambda\_{\\_}.\,()\right)\tag{3}$$

path, one could use a single syntactic category ρ ::= hi | ρ · ρ | (s : τ ⇒ τ ) | θ, where a more general form of row concatenation is allowed. This would allow using a standard notion of substitution, and would lead to different statements for some of the row subsumption rules.

This function admits the following value- and effect-polymorphic type:

$$\mathtt{i}\mathtt{i}\mathtt{er}:\forall\alpha.\forall\theta.\,\alpha\ \mathtt{i}\mathtt{i}\mathtt{t}\to(\alpha\xrightarrow{\theta}\mathtt{un}\mathtt{it})\xrightarrow{\theta}\mathtt{un}\mathtt{it}$$

This type states that the call iter xs f is safe, regardless of what the elements of the list xs might be, and regardless of what effects the user function f might perform. This type also guarantees that iter does not perform any effect of its own: instantiating θ with hi shows that this must be the case. Finally, one might think that this type guarantees that iter cannot intercept the effects performed by f. This may or may not be true, depending on which interpretation of effect-polymorphic types is chosen. A stronger interpretation can guarantee this property, but rules out certain useful programming language constructs, such as "dynamic-wind". Conversely, a weaker interpretation of effect-polymorphic types allows type-checking "dynamic-wind", but breaks this guarantee. At this time, the interpretation that we have verified in Coq is the weaker one (§5). We further discuss this point in Section 6.

#### 4.2 The typing judgment

A typing judgment in Tes takes the form Ξ | ∆ | Γ ` e : ρ : τ . It involves three environments: a row- and type-variable context Ξ, which binds row and type variables θ and α; an effect-name context ∆, which binds effect names s; and a type environment Γ, which maps variables x to types τ . This typing judgment states that the expression e has effect ρ and type τ . Like an arrow type, this judgment involves an implicit disjointness hypothesis bearing on the row ρ. That is, this judgment guarantees that it is safe to execute e provided the row variables and type variables in Ξ are instantiated in such a way that the multiset of effect labels denoted by ρ has no duplicate elements.

A selection of the typing rules appears in Figure 4. The typing rules for variables, functions, and applications are the same as in most type-and-effect systems. The typing rules for references are also standard, and are omitted. The rules TypeIntro, TypeElim, RowIntro, RowElim, which introduce and eliminate value- and effect-polymorphic types, are also standard. In the presence of mutable state, an unrestricted introduction rule for polymorphic types is unsound [34]. In this paper, we avoid this problem simply by building the value restriction [39,12] into TypeIntro and RowIntro. Our Coq formalization [36] proposes a more elaborate approach, where function types and typing judgments are annotated with purity attributes. This approach yields a slightly more expressive system, where, in particular, perform s x is considered a pure expression, therefore can receive a polymorphic type.

Rule Effect, read from bottom to top, changes the current effect from ρ to (s : abs)· ρ. Intuitively, this means several things. First, while type-checking e, it is safe to assume that the effect label denoted by s is disjoint from the multiset of effect labels denoted by ρ. This assumption is implicitly expressed by the mere appearance of the row (s : abs) · ρ in the premise. This assumption is justified indeed, since the effect name s is bound to a fresh effect label when effect s in e is executed. Second, because of the absence signature s : abs, one must check that the expression e does not perform any effect with the name s. This seems a natural and unavoidable restriction: if such an effect was allowed, there would be no static effect name by which it can be described. Third, because of the side condition s /∈ ρ, one must check that the row that appears in the premise contains at most one singleton signature for the effect name s. As a counterexample, if the expression e has effect (s : abs) · (s : abs), then the typing rule Effect cannot be applied. The subsumption rule Sub cannot help, because the subsumption judgment (s : abs) · (s : abs) ≤ (s : abs) does not hold. Thus, the rule Effect enforces a disjointness constraint.

Rule Perform states that, when one performs an effect whose signature is s : ι ⇒ κ, one must pass a payload value of type ι, and, in return, one can expect a value of type κ. This supports the intuitive idea that performing an effect is analogous to calling an effect-free function of type ι → κ.

Rule Handle type-checks handle e with s : h | r, where the expression e is monitored by a handler for the effect s. This rule expresses the idea that this construct establishes a boundary between the inside, where effects named s may be performed in accord with the signature s : ι ⇒ κ, and the outside, where effects named s may be performed in accord with a different signature s : ι <sup>0</sup> ⇒ κ 0 . Because s : abs is sugar for s : ⊥ ⇒ >, this rule also covers the common case where the effect s is absent on the outside. Both the effect branch h and the return branch r are part of the "outside world", so their effects are described by the outside row ρ 0 . This remark explains all occurrences of ρ 0 in the last two premises, except the one in the type of the continuation. The continuation, which 0

is the second parameter of the effect branch h, has type κ ρ −→ τ 0 . Because we have adopted a "deep-handler" semantics (§3), a copy of the handler is reinstalled inside the continuation. This explains why the effect ρ <sup>0</sup> and the result type τ <sup>0</sup> of the continuation are the same as those of the whole handle construct.

Rule Sub weakens a typing judgment by replacing an effect ρ and a type τ with a weaker effect ρ <sup>0</sup> and a weaker type τ 0 . This rule relies on several subsumption judgments, which we discuss next.

#### 4.3 The subsumption judgments

The subsumption judgments on types, signatures, and rows appear in Figure 5. An original aspect is that these judgments depend on a disjointness context D, which appears on the left of the turnstile. A disjointness context is a (possibly empty, unordered) list of rows, and is interpreted as a conjunction of disjointness hypotheses: one hypothesis bears on each row. For instance, the disjointness context (s<sup>1</sup> : ι<sup>1</sup> ⇒ κ1) · (s<sup>2</sup> : ι<sup>2</sup> ⇒ κ2), (s<sup>3</sup> : ι<sup>3</sup> ⇒ κ3) · θ, which is a list of two rows, is equivalent to a conjunction of two disjointness hypotheses. The first hypothesis is equivalent to s<sup>1</sup> 6= s2: it represents the assumption that the effect names s<sup>1</sup> and s<sup>2</sup> denote two distinct effect labels. The second hypothesis expresses the assumption that the effect label denoted by s<sup>3</sup> is not a member of the multiset of effect labels denoted by θ and that this multiset has no duplicate elements.


Fig. 5. The subsumption judgments

In the subsumption rules, the disjointness context is extended in the rule Arrow and exploited in the rule Erase. Elsewhere, it is just transported.

Subsumption on types. The subsumption judgment on types D ` τ ≤<sup>T</sup> τ <sup>0</sup> means that, under the hypothesis D, τ is a subtype of τ 0 . The rules in Figure 5 state that this relation is reflexive, transitive, and admits ⊥ and > as bottom and top elements. On function types, as usual, subsumption is contravariant in the domain and covariant in the effect and in the codomain. One original aspect of Arrow is that this rule enriches the disjointness context: in the premises, the disjointness context changes from D to D, ρ<sup>0</sup> . The intuitive reason why this is sound is that if someone uses a function at type τ 0 ρ 0 −→ κ 0 then (at the point where the function is used) the disjointness hypothesis ρ <sup>0</sup> must be satisfied, because this hypothesis is part of our interpretation of function types. Thus, when proving that a function of type τ ρ −→ κ can be used as a function of type τ 0 ρ 0 −→ κ 0 , it is safe to rely on the disjointness hypothesis ρ 0 .

Subsumption on signatures. The subsumption judgment on signatures takes the form D ` σ ≤<sup>S</sup> σ 0 . Signature subsumption is reflexive and transitive. (Reflexivity is given by SigRefl; transitivity is derivable.) According to SigCons, unlike the standard function type constructor · → ·, the signature constructor s : · ⇒ · is covariant in its domain and contravariant in its codomain. Indeed, when the signature s : ι ⇒ κ appears in the effect of an expression e, this means that e has permission to perform an effect named s at type ι ⇒ κ. In other words, e can assume that performing an effect named s is analogous to calling a function of type ι → κ. This explains the reversed variance.

Subsumption on rows. The row subsumption judgment is D `<sup>b</sup> ρ ≤<sup>R</sup> ρ 0 . The Boolean parameter b will be explained shortly. Row subsumption is reflexive and transitive. (Reflexivity is derivable; transitivity is given by RowTrans.) By combining Empty, Extend, RowCons, Swap, and RowTrans, one finds that if two rows, viewed as multisets of effect signatures, are related by multiset inclusion, then they are related by subsumption. Thus, subsumption allows permuting row entries in arbitrary ways and extending a row with new entries.

The last row subsumption rule, Erase, allows dropping an effect signature of the form s : abs. This rule may seem plausible because, both in the presence of the effect signature s : abs and its absence, the effect s is forbidden. However, an unqualified axiom ` (s : abs) · ρ ≤<sup>R</sup> ρ would be unsound. This is due to our interpretation of the row carried by a typing judgment (or by a function type) as a disjointness hypothesis. By changing a typing judgment that carries the row (s : abs) · ρ into one that carries the row ρ, one removes the hypothesis that the effect label denoted by s is not a member of the multiset of effect labels denoted by ρ. In order to safely remove a hypothesis, one must prove that it is satisfied. This explains why Erase must carry the premise D s # ρ, whose intuitive meaning is that "the hypotheses in D guarantee that the effect label denoted by s is not among the effect labels denoted by ρ".

The parameter b serves to forbid a use of Erase under RowCons. Erase requires this flag to be true, but RowCons sets it to false in its premise. Without this restriction, one could first combine Erase and DisjEmpty to prove ` (s : abs)·hi ≤<sup>R</sup> hi, then use RowCons and induction to obtain ` (s : abs)·ρ ≤<sup>R</sup> ρ without any side condition, thus circumventing the side condition in Erase.

The four rules that define the effect/row disjointness judgment D s # ρ are straightforward. The first two rules decompose the row ρ, which is a list of effect signatures σ. The last two rules look up the disjointness context D so as to find a disjointness hypothesis ρ that implies the goal. Whether ρ implies the goal is decided based on a simple syntactic criterion: the relation · ⊆<sup>m</sup> · denotes multiset inclusion; the row on the right-hand side is viewed as a multiset of effect signatures.<sup>5</sup>

The desire to support Erase is the reason why the subsumption judgments carry a disjointness context. In a hypothetical simplified system where these judgments do not carry such a context, the premise of Erase would have to use an empty disjointness context True. This premise would become True s # ρ, which is false, so Erase would become inapplicable. Yet Erase is desirable, because it is useful in practice. We use it to type-check our encoding of a lexically scoped handler: this is illustrated in Section 4.4.

Why is ` (s : abs)·ρ ≤<sup>R</sup> ρ unsound? In the presence of this axiom, the judgment ` (s : abs) · (s : abs) ≤<sup>R</sup> (s : abs) would be derivable. This judgment can be exploited to type-check the following unsafe program:

> effect s in 2 handle handle (perform s ()) with s : λx \_. not x | λ\_. true with s : λ\_ \_. () | λ\_. ()

This program is unsafe because the effect s is performed with a payload of type unit, namely the unit value () on line 3, and this effect is handled by the innermost handler, also on line 3, which expects the payload x to be a Boolean value. When this program is executed, it becomes stuck by attempting to execute the function application not ().

Yet, under the assumption ` (s : abs) · (s : abs) ≤<sup>R</sup> (s : abs), this program is well-typed, with an empty row and with the type unit. Beginning at the root and working towards the leaves, the type derivation begins with an application of Effect, which changes the empty row into the row (s : abs). Then, by using Sub and by exploiting the above assumption, the row (s : abs) can be changed to (s : abs)·(s : abs). At this point, the harm is done. Indeed, under the row (s : abs) · (s : abs), the subprogram at lines 2–4 is well-typed. The fact that this row includes two signatures for the effect name s allows us to install two handlers for this name. The handler on line 2 allows its handlee—the expression

<sup>5</sup> Our Coq code [36] presently employs a different representation of disjointness contexts and a different definition of the effect/row disjointness judgment. We believe, but have not yet checked, that the Coq and paper formulations are equivalent.

on line 3—to perform effects according to the signature s : unit ⇒ unit. The handler on line 3 allows its handlee to perform effects as per s : bool ⇒ unit. The expression perform s () is type-checked with respect to the composite row (s : unit ⇒ unit) · (s : bool ⇒ unit), which means that this expression must respect either of these two signatures. It does indeed respect the first one, so it is well-typed.

#### 4.4 Examples

Filter Recall the higher-order iteration function iter (Eq. 3), whose type is

$$\begin{array}{c} \textbf{iter}: \forall \alpha. \,\forall \theta. \,\,\alpha \,\,\mathsf{list} \to (\alpha \xrightarrow{\theta} \mathsf{unit}) \xrightarrow{\theta} \mathsf{unit}. \end{array}$$

Let us use iter in the definition of filter:

filter xs f = let g = (λx. if f x then perform yield x) in iter xs g

The expression filter xs f "yields" each element x of the list xs in turn, by performing a yield effect if f x returns true. In Tes, filter is well-typed, and its type is:

filter : ∀α. ∀θ. α list → (α θ −→ bool) (yield : α⇒unit)·θ −−−−−−−−−−−→ unit

Checking that filter is well-typed is not difficult. Under the assumption that f has type α θ −→ bool, the subexpression f x has effect θ. Under the assumption that x has type α, the subexpression perform yield x has effect (yield : α ⇒ unit). Because our subsumption rules allow extending a row with a new entry and exchanging row entries, the composite subexpression if f x then perform yield x admits the composite effect (yield : α ⇒ unit) · θ.

What does filter's type mean? Ostensibly, the row (yield : α ⇒ unit) · θ tells us that every effect performed by filter xs f must be either a yield effect or an effect caused by f. Less obviously, these alternatives must be mutually exclusive: indeed, the row (yield : α ⇒ unit) · θ carries the implicit requirement that the effect label denoted by yield is not among the effect labels denoted by θ. In other words, filter's type forbids f from performing yield effects.

The reader may wonder what prevents us from instantiating θ with a row that includes the effect name yield, such as (yield : α ⇒ unit). The answer is, nothing prevents such an instantiation. The result, however, would be a view of filter as a function whose effect is (yield : α ⇒ unit)·(yield : α ⇒ unit). Such an effect carries an unsatisfiable disjointness hypothesis, namely yield 6= yield. As a result, once the type of filter has been instantiated in this way, filter cannot be called anymore.<sup>6</sup>

<sup>6</sup> Technically, an application of this instantiated filter function can still be well-typed, but only if it appears in the body of a function which itself carries an unsatisfiable disjointness hypothesis and therefore can never be called.

Lexically scoped handlers We now derive a typing rule for lexically scoped handlers. Recall the encoding of a lexically scoped handler (Eq. 1):<sup>7</sup>

$$\mathsf{lex}\text{-}\mathsf{h}\text{-}\mathsf{h}\text{-}\mathsf{h}\text{-}\mathsf{e}\text{ }\mathsf{with}\,\,h\,\mid r = \mathsf{e}\mathsf{f}\mathsf{f}\mathsf{e}\mathsf{c}\mathsf{t}\text{ }\mathsf{s}\text{ }\mathsf{in}\mathsf{h}\text{-}\mathsf{h}\text{-}\mathsf{e}\,\,\mathsf{s}\,\mathsf{in}\mathsf{h}\text{-}\mathsf{e}\,\,(\lambda x.\mathsf{ }\mathsf{perf}\mathsf{or}\mathsf{m}\,\,s\,\,x)\,\mathsf{with}\,\,s:\,h\,\mid r$$

For this construct, Tes admits the following derived typing rule:

$$\begin{array}{c} \mathsf{LEexHausDLE} \\ \mathsf{E} \mid \Delta \mid \varGamma \vdash e : \rho : \forall \theta . (\iota \xrightarrow{\theta} \kappa) \xrightarrow{\theta \cdot \rho} \tau \\ \mathsf{E} \mid \Delta \mid \varGamma \vdash h : \rho : \iota \to (\kappa \xrightarrow{\rho} \tau') \xrightarrow{\rho} \tau' \end{array} \begin{array}{c} s \notin \Gamma, \rho, \iota, \kappa, \tau, \tau' \\ \Xi \mid \Delta \mid \varGamma \vdash r : \rho : \tau \xrightarrow{\rho} \tau' \\ \hline \end{array}$$

This rule is similar to the typing rule for lexically scoped handlers that appears in Figure 3 of Biernacki et al.'s paper [6]. What is new and noteworthy is that we obtain this rule as a special case of a more permissive type discipline, Tes, which supports general effect handlers, as opposed to just lexically scoped handlers.

In LexHandle, whereas the effect on the outside is ρ, the effect on the inside is θ · ρ. That is, inside the handlee, one more effect is permitted. The handlee (the expression e) must be polymorphic in the row variable θ: that is, it must treat this extra effect as an abstract effect.

The derivation of LexHandle involves an application of Effect and an application of Handle. While proving that the premises of Handle hold, a key step is to prove that the type of the effect branch h can be weakened as follows, where ρ 0 is a shorthand for (s : abs) · ρ:

$$\frac{\Xi \mid \Delta \mid \Gamma \vdash h \; : \; \rho \; : \; \iota \to (\kappa \xrightarrow{\rho} \tau') \xrightarrow{\rho} \tau'}{\Xi \mid \Delta \mid \Gamma \vdash h \; : \; \rho \; : \; \iota \to (\kappa \xrightarrow{\rho'} \tau') \xrightarrow{\rho'} \tau'} \qquad \rho' = (s : \mathsf{abs}) \cdot \rho$$

It is not at all obvious that this is possible! Two occurrences of ρ must be changed into ρ 0 . One occurrence is positive and one is negative, and the rows ρ and ρ 0 are not equal. Still, this implication can be established, via rule Sub. One must check the following chain of subsumption relations:

$$
\iota \dashv (\kappa \xrightarrow{\rho} \tau') \xrightarrow{\rho} \tau' \leq\_T \iota \to (\kappa \xrightarrow{\rho} \tau') \xrightarrow{\rho'} \tau' \leq\_T \iota \to (\kappa \xrightarrow{\rho'} \tau') \xrightarrow{\rho'} \tau'
$$

The first step requires `<sup>b</sup> ρ ≤<sup>R</sup> ρ 0 , which, by Extend, is true. The second step requires ρ <sup>0</sup> `true ρ <sup>0</sup> ≤<sup>R</sup> ρ, which, by Erase, is true as well. The disjointness hypothesis ρ <sup>0</sup> plays a key role: indeed, True `true ρ <sup>0</sup> ≤<sup>R</sup> ρ is false. In other words, Erase is applicable because the disjointness hypothesis ρ 0 is available, and this hypothesis exists because Arrow causes it to appear as it descends into the domains of two function types that are annotated with ρ 0 .

<sup>7</sup> This encoding requires choosing an arbitrary name s that does not occur in e, h or r. Furthermore, in the derivation of the typing rule LexHandle, s may need to be renamed. On paper, we would normally not mention these details. However, because our Coq code does not currently allow α-conversion of effect names, we make s a parameter of the macro lex-handle and we include a freshness hypothesis bearing on s in LexHandle.

Counter Using the type rule LexHandle, it is straightforward to check that counter (§2, Eq. 2) can be assigned the following type:

counter : ∀α β γ. (∀θ. (α θ −→ β) θ −→ γ) → ∀θ. (α θ −→ β) θ −→ (γ \* int)

This means that counter accepts an arbitrary effect-polymorphic second-order function ff and produces a function ff <sup>0</sup> whose type is similar to ff 's type. The only difference between the types of ff and ff 0 is in their result types, to wit, γ versus γ \* int.

It is not hard to see that the expression counter (counter (λf. f ())) (λ\_. ()), where two instances of counter are nested, is also well-typed, and that its type is (unit \* int) \* int.

Mix The following second-order function, mix, involves a potentially challenging mixture of features:

$$\mathsf{mix}\,f = \mathsf{handle}\,(\mathsf{perfform}\,s\,();f\,())\\\mathsf{with\ s: }\lambda\\_k.k\,()\mid\lambda\\_.()$$

The effect name s occurs free in this code, so this is not an instance of a lexically scoped handler. (We assume that the name s is introduced by the surrounding context.) The subexpression perform s (); f () visibly performs the effect s and calls the unknown function f, which itself may perform various effects, perhaps including the effect s. This subexpression is monitored by a handler for the effect s at type unit ⇒ unit.

In Tes, mix is well-typed. In fact, it admits several types. We show three: the first two are equivalent, and the last one subsumes the first two.

The first idea that comes to mind may be: "since f has an unknown effect, let's represent this effect with a row variable θ". Thus, one introduces a row variable θ, and one assumes that f has type unit <sup>θ</sup> −→ unit. Under this assumption, one finds that perform s (); f () has effect (s : unit ⇒ unit) · θ. (The subsumption rule Extend is used, twice, to merge the effect of perform s () and the effect of f ().) Finally, using Handle, one finds that the body of the function mix has effect (s : abs) · θ. In summary, mix admits the following type:

$$\texttt{mix} : \forall \theta. \left(\texttt{unit} \xrightarrow{\theta} \texttt{unit}\right) \xrightarrow{\left(\texttt{s} : \texttt{abs}\right) \cdot \theta} \texttt{unit} \tag{4}$$

The effect (s : abs) · θ carried by the second arrow means that mix never throws the effect s and transmits whatever effects f may throw, provided these effects do not include s. Indeed, the row (s : abs)· θ is interpreted not only as a description of mix's potential effects, but also as a disjointness constraint. Thus, the row (s : abs) · θ in this type (4) cannot be replaced with just θ. Such a replacement would amount to discarding the disjointness constraint, which would be unsound.

The reader may wonder what happens if θ is instantiated, in the above type, with a row that mentions s, such as s : int ⇒ int. Technically, this is permitted, but yields a version of mix whose effect is (s : abs) · (s : int ⇒ int). Such a function can never be called.

Thus, this type (4) effectively forbids f from performing effect s. One may wonder whether this fact can be made explicitly visible in the type of mix. In fact, it can. By the subsumption rules Arrow, Extend, and Erase, the type (4) is equivalent to the following type:

$$\texttt{mix} : \forall \theta. \left( \texttt{unit} \xrightarrow{\left( \begin{smallmatrix} \texttt{s} \ \texttt{s} \ \texttt{abs} \end{smallmatrix} \right) \cdot \theta} \texttt{unit} \right) \xrightarrow{\left( \begin{smallmatrix} \texttt{s} \ \texttt{s} \ \texttt{abs} \end{smallmatrix} \right) \cdot \theta} \texttt{unit} \tag{5}$$

Indeed, under the disjointness constraint carried by the outer arrow, the rows θ and (s : abs) · θ are equivalent.

It is worth noting that this type allows the function f to use the effect s internally, if desired, and at an arbitrary type, provided this effect is handled internally by f and does not escape.

Finally, one may wonder whether it is necessary to forbid f from visibly performing effect s. In fact, it is not: one can allow f to perform this effect and let it escape, provided it is performed at type unit ⇒ unit, which is the type expected by the handler inside mix. It is not difficult to check that mix admits the following type:

mix : ∀θ. (unit (s : unit⇒unit)·θ −−−−−−−−−−→ unit) (s : abs)·θ −−−−−−→ unit (6)

This type (6) is in fact more general than (that is, a subtype of) the previous type (5). This follows directly from the fact that s : abs is a short-hand for s : ⊥ ⇒ > and from the subsumption rules SigCons, RowCons, and Arrow.

# 5 Metatheory

In this section, we present the general architecture of the proof of our type soundness statement (Theorem 3), which states that, if a closed program e is well-typed, then e is safe: that is, e may diverge or terminate with a value, but cannot perform an unhandled effect. Full details are found in our Coq code [36].

Our first step is to interpret our typing judgments as semantic typing judgments. A semantic typing judgment Ξ | ∆ | Γ e : ρ : τ is a logical assertion stating that substituting certain values for the free variables of e yields a closed program that meets a certain specification. To fill in the details, one must define precisely which values may be substituted and what specification is met.

To do so, we introduce TesLogic, an extension of Iris [16], an expressive Separation Logic. Iris's base logic has no built-in support for effects and handlers, but allows constructing a program logic with such support. de Vilhena and Pottier define such a logic, Hazel [35]. Because Hazel is tailored for unnamed effects and one-shot continuations, we cannot re-use it. Nevertheless, in the design of TesLogic, we do rely on one of Hazel's key features, protocols.

A protocol Ψ describes a service on which the handlee can rely and which the handler must implement. Mathematically, it is a binary relation between a value v, the payload of the effect, and a predicate Φ, the precondition of the continuation for this effect. A typical example of a protocol is the pre/post protocol Weakest precondition

wp e hEi{Φ} , ValidDistinct E.1 −−∗ ewp e hEi{Φ}

Basic weakest precondition

$$\begin{array}{rcl} \text{ewp } v \text{ } \langle E \rangle \{ \Phi \} & \triangleq \Phi \langle v \rangle \\\\ \text{ewp } (\text{efft } \ell \text{ } v \text{ } K) \langle E \rangle \{ \Phi \} & \triangleq \exists \Psi. (\ell, \Psi) \in E \, \ast \, (\uparrow\_{\Box} \Psi) \, v \, (\lambda w. \rhd \, \lnot \, wp \, \, K[w] \, \langle E \rangle \{ \Phi \}) \\\\ \text{ewp } e \, \langle E \rangle \{ \Phi \} & \triangleq \forall \sigma. S(\sigma) \, \top \exists \mathsf{k}^{0} \\\\ & & \begin{cases} \exists \, e', \, \sigma'. \, e \, \, \slash \, \sigma \, \longmapsto e' \, \slash \, \sigma' \, \ast \\ \forall \, e', \, \sigma'. \, e \, \slash \, \sigma \, \longrightarrow e' \, \slash \, \sigma' \, \exists \mathsf{k}^{0} \, \models \, \mathsf{k}^{0} \\ \quad \, \mathsf{S} (\sigma') \, \ast \, \, \mathsf{e} \, \mathsf{w} \, e' \, \langle E \rangle \{ \Phi \} \end{cases} \end{array}$$

Persistent upward closure

$$(\uparrow\_\Box \Psi) \ v \Phi \ \stackrel{\Phi}{=} \exists \Phi' . \Psi \ v \Phi' \ \* \ \Box \forall w . \Phi'(w) \ \neg \star \Phi(w)$$

Validity-and-distinctness property

ValidDistinct L , NoDup L ∧ V `∈L ` 7→ ()

{Φ1}.{Φ2}, defined as λ v Φ. Φ1(v) ∗ ∀ w. Φ2(w) −−∗ Φ(w). We use this protocol (in the interpretation of signatures, Figure 7) to attach a precondition Φ<sup>1</sup> and a postcondition Φ<sup>2</sup> to an effect: performing an effect with payload v is permitted if Φ1(v) holds, and one can assume that it returns a value w such that Φ2(w) holds. The symbol is Iris's persistence modality. Here, it reflects the fact that continuations are multi-shot: a single perform expression can "return" several times with several different values of w, so we must be prepared to exploit Φ<sup>2</sup> several times.

To reason about labeled effects, we introduce the notion of a protocol list E, a list of pairs of a label and a protocol. Therefore, whereas Hazel's weakest precondition modality is parameterized with a single protocol, ours is parameterized with a protocol list. In our setting, the assertion wp e hEi{Φ} means that (1) it is safe to execute e; (2) if e produces a value v then Φ(v) holds; and (3) if e performs an effect labeled ` then it does so according to a protocol Ψ such that (`, Ψ) ∈ E holds. Its definition appears in Figure 6. It is broadly similar to Hazel's wp modality, save for three aspects: the use of a protocol list E; the use of a persistent upward closure; and the appearance of a validity-and-distinctness property as an assumption of the weakest precondition assertion. The persistent upward closure again has to do with the fact that continuations are multi-shot. The validity-and-distinctness property expresses two properties of the labels in the list E; first, these labels are pairwise distinct; second, these labels have been allocated. The latter fact is expressed by a persistent points-to assertion [37].

Interpretation of types (selected cases)

$$\begin{array}{lcl}\mathcal{V}\llbracket\tau\upharpoonright\ast\kappa\rrbracket^{\delta}\_{\eta}(v) & \triangleq\sqcap\forall w.\,\mathcal{V}\llbracket\tau\upharpoonright\square^{\delta}\_{\eta}(w)\dashtwo\limits\_{\begin{subarray}{c}\mathsf{T}\boldsymbol{\pi}\rrbracket\boldsymbol{\pi}\vdash\boldsymbol{0}\mathrel{\displaystyle\displaystyle}\mathsf{T}\boldsymbol{\pi}\rrbracket\boldsymbol{\pi}\vdash\boldsymbol{0}\mathrel{\displaystyle\displaystyle}\mathsf{T}\boldsymbol{\pi}\rrbracket\boldsymbol{\pi}\rrbracket\end{subarray}}{ $\mathcal{V}\llbracket\boldsymbol{\pi}\boldsymbol{\theta}.\,\boldsymbol{\tau}\rrbracket^{\delta}\_{\eta}(v)\triangleq\mathsf{V}\boldsymbol{E}.\,\mathcal{V}\llbracket\boldsymbol{\tau}\rrbracket^{\delta}\_{\eta,\boldsymbol{\theta}\mapsto\boldsymbol{E}}(v)$ }\end{array}$$

Interpretation of rows and signatures

$$\mathcal{R}[\![\rho]\!]^{\delta}\_{\eta} \triangleq \bigcup\_{\sigma \in \rho} \mathcal{S}[\![\sigma]\!]^{\delta}\_{\eta} \qquad \mathcal{S}[\![s:\iota \Rightarrow \kappa)\!]^{\delta}\_{\eta} \triangleq (\![\delta(s), \{\mathcal{V}[\iota]\}\_{\eta}^{\delta}). \{\mathcal{V}[\kappa]\!]^{\delta}\_{\eta})) \in \mathcal{S}[\![\theta]\!]^{\delta}\_{\eta} \triangleq \eta(\theta)$$

Interpretation of typing judgments

$$\begin{aligned} \left(\Xi \mid \Delta \mid I \vdash e : \rho : \tau \right) & \triangleq \forall \eta \text{, } \delta \text{, } vs. \mathcal{G} [\![I\!]\!]\_{\eta}^{\delta}(vs) \rightharpoonup \ wp \text{ } (e [vs][\delta]) \; \langle \mathcal{R} [\![\rho]\!]\_{\eta}^{\delta} \rangle \{\mathcal{V}[\tau]\!\} \, \\ \mathcal{G} [\![I\!]\!]\_{\eta}^{\delta}(vs) \text{ } \triangleq \forall \{x \mapsto \tau\} \subseteq I . \mathcal{V}[\![\tau\!]\!]\_{\eta}^{\delta}(vs(x)) \end{aligned}$$

Fig. 7. Interpretation of types, rows, signatures, and typing judgments

This notion of wp enjoys a set of reasoning rules that we omit. The following theorem states that it is sound to reason about programs by means of these rules:

# Theorem 1 (Soundness of TesLogic). If wp e h[]i{Φ} holds, then e is safe.

With TesLogic at hand, let us come back to the definition of the semantic judgment Ξ | ∆ | Γ e : ρ : τ .

As usual, a type τ is interpreted as a semantic type, that is, a persistent predicate <sup>V</sup>J<sup>τ</sup> <sup>K</sup> δ <sup>η</sup> on values. More unusually, a row ρ is interpreted as a protocol list <sup>R</sup>Jρ<sup>K</sup> δ η , defined as S <sup>σ</sup>∈<sup>ρ</sup> <sup>S</sup>Jσ<sup>K</sup> δ η , the list concatenation of the interpretations of the elements of ρ. The environment δ maps effect names to effect labels; η maps type variables to semantic types and row variables to protocol lists.

This said, our interpretation of types (Figure 7) is mostly standard [19]. The interpretation of a function type, <sup>V</sup>J<sup>τ</sup> ρ −→ <sup>κ</sup><sup>K</sup> δ η , is the set of values v such that the application of <sup>v</sup> to a value <sup>w</sup> in <sup>V</sup>J<sup>τ</sup> <sup>K</sup> δ η satisfies a wp assertion with protocol list <sup>R</sup>Jρ<sup>K</sup> δ <sup>η</sup> and postcondition <sup>V</sup>Jκ<sup>K</sup> δ η . What is crucial is that the validity-anddistinctness property that we have built into the definition of wp formalizes the requirement that effect names be pairwise distinct. The interpretation of an effect-polymorphic type involves a quantification ∀E over protocol lists.

#### Theorem 2 (Fundamental Theorem). The syntactic judgment entails the semantic judgment: Ξ | ∆ | Γ ` e : ρ : τ =⇒ Ξ | ∆ | Γ e : ρ : τ .

We establish this theorem by induction on the syntactic typing judgment. For every syntactic typing rule, we prove that the interpretation of the conclusion follows from the interpretations of the premises.

The previous two theorems lead directly to the desired type soundness result:

Theorem 3 (Soundness of Tes). If ∅ | ∅ | ∅ ` e : hi : unit, then e is safe.

# 6 Related Work

Hillerström and Lindley [14] study the core calculus of Links [9], a functional programming language for web applications, which they extend with support for effect handlers. Taking advantage of Links's row-based approach to typechecking records, they annotate function types with rows of effects. Their rows use Rémy's kind discipline [32] to ensure that an effect name can never appear twice in a row.

Leijen [22] formalizes a subset of the Koka language [23]. He presents a calculus with support for handlers and globally defined effects, a type system with value and effect polymorphism, and a compilation strategy for explicitly-typed programs. This strategy relies on a selective CPS transformation [26], which he extends with support for effect polymorphism. A row in Leijen's system is univariate: it contains at most one row variable. Tes, in contrast, allows a row to contain several row variables. This ability is exploited, for example, in the typing rule LexHandle. Indeed, the premise contains the effect-polymorphic type ∀θ. (α θ −→ β) θ·ρ −−→ τ, where θ abstracts away the fresh effect label that is allocated by lex-handle.

A notable omission from Leijen's formalization is Koka's inject [21], which is akin to a lift coercion. Biernacki et al. [4] are the first authors to provide a formal treatment of such a construct. They define its operational semantics and they propose a type system with effect polymorphism and univariate rows. They present the first binary logical relations for effect handlers, and they use these relations to prove that their system is sound. In a later paper [5], the same authors introduce λ HEL, a calculus that supports both dynamic allocation of effect labels and effect coercions. In addition to the lift coercion, they consider (1) the swap coercion, which exchanges two effects in a row; (2) the cons coercion, which rearranges effects deep in a row; and (3) composition of coercions. These new coercions do not add expressiveness: they can be expressed in terms of lift. Still, they help programmers control the dynamic search for a handler. Biernacki et al. propose a type system with support for universal and existential types. Although counter, discussed in Sections 2 and 4, is expressible in λ HEL, Biernacki et al.'s type system does not accept this program. (This has been confirmed by the authors in a personal communication.) The technical reason why counter is ill-typed is that the subsumption rules are not sufficiently flexible: an abstract row θ cannot be weakened to a larger row. It is not trivial how to overcome this issue, because the interpretation of a signature in Biernacki et al.'s system depends on the signature's position in the row. Tes, in contrast, allows extension, thanks to the rule Extend.

Zhang and Myers [41] present "a new semantics based on tunneling", which they claim avoids "accidental handling" by construction. As far as we understand, however, they do not propose a semantics in the usual sense, that is, a reduction semantics. Instead, their "semantics" seems to be a translation of the surface language into a core calculus, λ⇓⇑. This translation is not formally defined: it is sketched by way of examples. Furthermore, as noted by Biernacki et al. [6], there is a discrepancy between the paper presentation of λ⇓⇑ and its Coq formalization. The paper does not mention dynamic generation of effect labels, but the calculus that is formalized in Coq supports this feature via a construct that generates a fresh effect label and installs a handler for this label; in other words, a lexically scoped handler.

For this calculus with lexically scoped handlers, Zhang and Myers propose a type system with support for effect polymorphism. They prove its soundness using binary logical relations. Then, they exploit these logical relations to establish interesting typed contextual equivalence laws. One law [41, Example 1] shows that an effect-polymorphic function cannot intercept the effects represented by an abstract row variable. This law seems to express the intuitive idea of "absence of accidental handling", but we remark that this notion is never formally defined.

Zhang and Myers [41] and other authors [8] suggest that "absence of accidental handling", sometimes also referred to as "effect safety", has something to do with parametricity. Unfortunately, "parametricity" itself is a somewhat looselydefined concept. As far as we understand, the word "parametricity" refers to the fact that a syntactic universal type is interpreted via a meta-level universal quantification over a certain universe of semantic types. However, the strength of this meta-level quantification depends on which universe of semantic types is chosen. A smaller universe yields a system with weaker universal types, which may enjoy fewer equivalence laws, but may also admit more well-typed programs.

To illustrate this point, let us ask whether our calculus, TesLang, can be extended with a "dynamic-wind" construct [11]. This construct, dynamic-wind p e q, monitors the execution of e and invokes the thunk p whenever control enters e (at the beginning of e's execution and every time e is resumed) and invokes the thunk q whenever control leaves e (at the end of e's execution and every time e performs an effect). To type-check this construct, one might extend Tes with the following typing rule:

DynamicWind

$$\frac{\Xi \mid \Delta \mid \varGamma \vdash p : \rho : \mathsf{unit} \to \mathsf{unit} \qquad \Xi \mid \Delta \mid \varGamma \vdash q : \rho : \mathsf{unit} \to \mathsf{unit}}{\Xi \mid \Delta \mid \varGamma \vdash \mathsf{dynamic} \cdot \mathsf{unit} \, p \,\, q : \rho : \tau}$$

We have proved that this rule is sound with respect to the interpretation of types presented in Section 5. So, our semantic model supports dynamic-wind. Furthermore, our semantic model arguably enjoys "parametricity", since a universal type is interpreted via a meta-level universal quantification. Yet, introducing dynamic-wind breaks Zhang and Myers's desired equivalence law [41, Example 1], because it allows observing arbitrary effects, without knowledge of their name and type. Therefore, "parametricity" does not guarantee "absence of accidental handling".

The lesson that we draw from this remark is that a programming language designer is faced with a tension between making the language more powerful by introducing constructs such as dynamic-wind, allowing new programs to be written, and making the language less powerful by forbidding such constructs, thereby validating new equivalence laws. Our (unary) semantic model (§5) errs on the side of admitting more constructs and fewer equivalence laws. In future work, it would be interesting to propose a (binary) semantic model that admits fewer constructs and validates more laws, so as to prove that Tes without dynamic-wind validates Zhang and Myers's law [41, Example 1].

Despite their previous studies of coercions [4,5], Biernacki et al. [6] argue against coercions, which they deem impractical for real-world programming, and propose a type system for a language that supports lexically scoped handlers only. They present two semantics for this language: (1) an open semantics, where effect names are not substituted with labels, and where evaluation is defined among open terms in a capture-avoiding way; and (2) a generative semantics, where effect names are substituted at runtime with effect labels, as in TesLang. By means of binary logical relations, they prove that the type system is sound and that the two semantics are equivalent.

Kammar and Pretnar [18] show that a calculus with effects and handlers but without references and without dynamic allocation of effect labels admits a type system with unrestricted polymorphism. Thus, generalization applies even to an expression that performs and handles effects. Kammar and Pretnar establish the soundness of their system via a syntactic approach [40]. The version of Tes that we have formalized in Coq [36] distinguishes pure and impure expressions and allows generalizing the type of a pure expression. The pure expressions include expressions that perform or handle effects. Allocating a fresh effect label is still considered impure. Although such an allocation seems intuitively harmless, our current semantic model interprets allocation as an Iris "update", and Iris does not allow exchanging a universal quantifier with an update modality, so we are unable to justify that allocation is pure. We conjecture that this problem would perhaps not appear in a syntactic approach.

# 7 Conclusion

In this paper, we have argued in favor of a simple semantics for effect handlers, where the dynamic search for a handler is based purely on equality of effect labels, and where fresh labels can be generated at runtime. This language can express, but is not restricted to, lexically scoped handlers. We have proposed a type system equipped with type and effect polymorphism and with a powerful subsumption relation. A distinguishing feature is the idea that a row expresses a disjointness requirement on effect labels. We have established type soundness via a semantic approach.

In future work, it would be desirable to strengthen our semantic model and turn it into a binary model, so as to establish contextual equivalence laws such as Zhang and Myers's [41]. We also wish to investigate support for modules and inference of principal types, with the ultimate aim of proposing a strong type system for OCaml 5.

# References


42. Zhang, Y., Salvaneschi, G., Beightol, Q., Liskov, B., Myers, A.C.: Accepting blame for safe tunneled exceptions. In: Programming Language Design and Implementation (PLDI). pp. 281–295 (Jun 2016)

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Interpreting Knowledge-based Programs**

Alexander Knapp1() , Heribert Mühlberger1 , and Bernhard Reus2

> <sup>1</sup> Universität Augsburg, Augsburg, Germany {knapp, muehlber}@informatik.uni-augsburg.de <sup>2</sup> University of Sussex, Brighton, UK bernhard@sussex.ac.uk

**Abstract** Knowledge-based programs specify multi-agent protocols with epistemic guards that abstract from how agents learn and record facts or information about other agents and the environment mutual dependency between the evaluation of epistemic guards over the reachable states and the derivation of the reachable states depending on the evaluation of epistemic guards synchronous programming languages to the interpretation problem of knowledge-based programs and demonstrate that the resulting constructive interpretation is monotone and has a least fixed point. We relate our approach with existing interpretation schemes for both synchronous and asynchronous programs interpretation and illustrate the procedure by several examples and an application to the Java memory model.

# **1 Introduction**

Knowledge-based programs [14] describe multi-agent systems based on explicit knowledge tests on what an agent knows or does not know about itself, other agents, and the environment: Extending standard programs, an agent may look beyond what it can directly observe by reasoning about the possible states of the other agents and the environment in all possible program executions. Such non-local, epistemic conditions abstract from how an agent may learn and record particular environmental facts or information about other agents. Thus knowledge-based programs rather are specifications of (multiagent) protocols that may be implemented by standard, directly executable programs. For being implementable in the first place, however, it has to be ensured that the knowledge guards can be resolved consistently given all possible program executions.

Consider for example a bit transmission [14, Ex. 4.1.1, Ex. 7.1.1], where a sender S has to transmit a bit sbit over a lossy channel to a receiver R who has to acknowledge the reception, again over a lossy channel. This can be modelled by a knowledge-based program over the state variables sbit ∈ {0, 1}, rval ∈ {⊥, 0, 1}, and ack ∈ {0, 1} as follows: S can only directly observe (read) sbit and ack, and R only rval (but both may write all variables); (K<sup>R</sup> sbit = 0) ∨ (K<sup>R</sup> sbit = 1) expresses that R knows sbit's value and is abbreviated byK<sup>R</sup> sbit. The behaviour description consists of a looping guarded command with two branches that is started with rval = ⊥ and ack = 0, but sbit left undetermined:

$$\begin{array}{ccc} \mathbf{do} \ \neg \mathsf{K}\_{\mathsf{S}} \ \mathsf{K}\_{\mathsf{R}} \ \mathit{s} \ \mathsf{bit} \rightarrow \begin{array}{c} \left( \mathsf{rval} \leftarrow \mathsf{s} \mathsf{bit} \ \mathsf{or} \ \mathsf{skip} \end{array} \right) & \longrightarrow \begin{array}{c} \mathsf{S} \\\longrightarrow \end{array} \\\ \left[ \begin{array}{c} \mathsf{K}\_{\mathsf{R}} \ \mathsf{s} \mathsf{bit} \land \neg \mathsf{K}\_{\mathsf{R}} \ \mathsf{K}\_{\mathsf{S}} \ \mathsf{K}\_{\mathsf{R}} \ \mathsf{s} \ \mathsf{int} \rightarrow \begin{array}{c} \left( \mathsf{ack} \leftarrow \begin{array}{c} \mathsf{1} \ \mathsf{or} \ \mathsf{skip} \end{array} \right) \ \mathsf{od} \ \end{array} \ \begin{array}{c} \left( \mathsf{end} \ \end{array} \right) \end{array} \right]$$

The guarded branches are separated by a <sup>8</sup>, or means a non-deterministic choice, and skip doing nothing: S sends the bit as long as it does not know that R received it, and R

keeps acknowledging once it has learnt the bit and does not know that S knows this fact. The epistemic formulæ K<sup>a</sup> ϕ in the program are to be interpreted as in classical Kripke semantics: ϕ holds in all states (or worlds) that agent a currently deems possible. Which states these are is regulated on the one hand by what a can observe: any state that is indistinguishable from the current one by the available observations is possible for the agent. In the example only S can observe sbit, though, due to the protocol, it should be possible that eventually R knows its value. On the other hand, the possible states depend on which runs of the knowledge-based program may actually happen, i.e., which states are reachable taking epistemically guarded transitions: If only the actions of the program are taken, it is impossible to reach a state satisfying both rval 6= ⊥ and rval 6= sbit, which, however, is present in the global state space; but it is decisive that it is not reachable in any execution in order to have some execution where K<sup>R</sup> sbit can become true.

The interpretation of knowledge-based programs hinges precisely on this mutual dependency between the evaluation of epistemic guards over the reachable states and the derivation of the reachable states depending on the evaluation of the epistemic guards. This implicit definition of the epistemic state of the agents by the observables and the reachable states of the commonly known protocol is in stark contrast to Baltag's epistemic action models [4,31], where the epistemic state is given and manipulated explicitly. In many cases, including the bit transmission protocol, the reachable state space may be computed using static analysis techniques without taking into account the epistemic nature of the guards. However, the interplay between knowledge and reachability may sometimes become more intricate: The more states are reachable the less is known definitely, and the guards will in turn influence what is reachable positively or negatively.

Consider, for another example, a variable setting problem [14, Exc. 7.5] involving a single agent a and a single state variable x ∈ {0, 1, 2, 3}, where a cannot observe x directly. The agent executes the following guarded command starting with x = 0:

if <sup>K</sup><sup>a</sup> <sup>x</sup> <sup>6</sup>= 1 \_ <sup>x</sup> <sup>←</sup> <sup>3</sup> <sup>8</sup> <sup>K</sup><sup>a</sup> <sup>x</sup> <sup>6</sup>= 3 \_ <sup>x</sup> <sup>←</sup> <sup>1</sup> fi

Being an initial condition, x = 0 is reachable, whereas x = 2 is not reachable as 2 is never assigned. However, two different sets of reachable states make for a consistent interpretation of the knowledge guards for the remaining values: {x = 0, x = 1}, where K<sup>a</sup> x 6= 1 is false and K<sup>a</sup> x 6= 3 is true, and {x = 0, x = 3}, with the opposite results. The singleton set {x = 0} is ruled out, since both guards would be true such that x = 3 and x = 1 are reachable; and {x = 0, x = 1, x = 3} is impossible, since both guards are false and thus neither x = 1 nor x = 3 are reachable. Breaking this cycle by making one of the transitions unconditional on knowledge as, e. g., in

$$\begin{array}{ll} \textbf{if } \mathsf{K}\_{\mathsf{a}} \mathtt{x} \neq 1 \rightarrow \mathsf{x} \leftarrow 3\\ \left[ \begin{array}{l} \mathsf{K}\_{\mathsf{a}} \mathtt{x} \neq 3 \rightarrow \mathsf{x} \leftarrow 2\\ \text{true} \rightarrow \mathsf{x} \leftarrow 1 \end{array} \right. \\ \left[ \begin{array}{l} \text{true} \rightarrow \mathsf{x} \leftarrow 1 \end{array} \begin{array}{l} \textbf{if} \end{array} \right. \end{array} \right]$$

yields a knowledge-based program with the unique consistent interpretation {x = 1, x = 2}. For computing its behaviour, however, several steps are needed, first reasoning that x = 1 is reachable, then that x = 3 is not reachable, and, finally, that x = 2 is reachable.

*Related Work.* In their introduction and seminal treatise on knowledge-based programs [13,14], Fagin et al. characterise the unique interpretability of such programs by their "dependence on the past" w. r.t. some non-empty class of transition systems: The evaluation of knowledge guards in a state coincides for all interpretations in the class that share a common past of the state. A sufficient condition for this dependence is that the program "provides epistemic witnesses" for all interpretations of the class such that not knowing something at some point in time has a counter example in the past. A sufficient condition for this provision, in turn, is that the program is "synchronous", i.e., that all agents can determine the global time from their local states. For example, the bit transmission protocol provides epistemic witnesses and thus is uniquely interpretable; but it is not synchronous. The cycle-breaking variable setting program is also uniquely interpretable, but does not provide epistemic witnesses. For "asynchronous" knowledge-based programs, De Haan et al. [10] suggest to rely on classical iteration of the non-monotone reachability functional that interprets the knowledge modalities according to what currently is assumed to be reachable. The computation process is started with all states assumed to be reachable and stops when some set of states is repeated. This approach fixes some semantics for all knowledgebased programs, also for those which are cyclic and contradictory or only self-fulfilling.

The problem of mutual dependence of guard evaluation and reachability has also occurred in the design of synchronous programming languages [6] for embedded systems, like Esterel [7] or Lustre [18], which rely on "perfect synchrony": a step for reacting to some inputs takes zero time and output signals are produced at exactly the same time as the input signals. Since thus the status of a signal to be produced can be queried at the same time, this requires "logical coherence" saying that a (non-input) signal is present in a step of execution if, and only if, a command emitting this signal is executed in this step. Whereas Lustre forbids cyclic programs on a syntactic basis, Berry's approach to the semantics of Esterel [8] singles out "reactive" — at least one execution — and "determinate" at most one execution — programs using a static executability analysis: It is computed which signals *must* be present, i.e., have to occur inevitably, and which signals *cannot* be present, i.e., have no emitting execution. This is also referred to as must/cannot analysis and has to be performed several times for finding a fixed point of all the signal statuses.

In logic programming involving "negation as failure" under- and over-approximations in terms of three- and four-valued logics lead to the "Kripke-Kleene fixpoint" and "wellfounded" models; see [11] for an overview. There, however, the temporal dimension of reachability or executability is not involved. The "stable model semantics" [16,5] stresses the rational inclusion or exclusion of atoms: A set of atoms M is "stable" for a logic program Π if it coincides with the minimal set of atoms inferable from the "reduct" Π<sup>M</sup> which is obtained from Π by deleting each clause that has a negative literal ¬p in its body with p ∈ M, and all negative literals in the bodies of the remaining clauses. The definition is not algorithmic or constructive; the minimality condition rules out selffulfilling solutions, the reduction process avoids contradictions. Gelfond's "epistemic specifications" [15] extend (disjunctive) logic programs with a modality K for "subjective literals" for representing incomplete information in programs with several stable models.

*Contributions.* We apply the principles of the must/cannot analysis to the interpretability problem of knowledge-based programs. After recalling some basic notions of epistemic logic and epistemic transition structures (Sect. 2), we first recapitulate the approaches

by Fagin et al. [14] and De Haan et al. [10] in terms of epistemically guarded transition systems, a syntax-agnostic format for knowledge-based programs (Sect. 3). For a more direct analysis, our account of those designs is state-based rather than run-based. We demonstrate the results and the limits of both interpretation schemes by several examples that illustrate (a-)synchronicity and non-monotone interpretation for cyclic, contradictory, or self-fulfilling programs. The latter behaviour is the main motivation for our reformulation of the interpretation problem in terms of epistemic must/can transition structures which offer lower and upper bounds on the behaviour of a knowledge-based program (Sect. 4). We show that this constructive interpretation is always monotone and yields a least fixed point. However, lower and upper bound of the fixed point need not always coincide and we relate decided fixed points with the notions of "providing epistemic witnesses" and synchronicity. We then derive a representation of the behaviour of a knowledge-based program as a general rule system with not only positive but also negative premisses (Sect. 5). Such rule systems correspond to logic programs involving "negation as failure" and the intended solutions form "stable models". The must/can approximation technique, its monotonicity, and it fixed point properties directly transfer to such rule systems. We finally describe an implementation of our constructive interpretation approach in the "Temporal Epistemic Model Interpreter and Checker" (tEmIc, Sect. 6). For model checking interpreted knowledge-based programs, the tool supports CTLK, the combination of "Computational Tree Logic" (CTL) with epistemic logic. Moreover, this logic can also be used in program guards; the interpretation of such temporal-epistemic programs extends the previous approaches. We give some applications to the analysis of the Java memory model.

# **2 Epistemic Logic and Epistemic Transition Structures**

We briefly summarise the basic notions of epistemic logic for expressing knowledge guards [31,30]. We then define epistemic transition structures as the domain of interpretation of knowledge-based programs. These transition structures combine the temporal dimension of executing a program with the epistemic dimension for evaluating what agents know. Both the logic and the transition structures are built over an *epistemic signature* Σ = (P, A) that consists of a set of *propositions* P and a set of *agents* A.

#### **2.1 Epistemic Logic**

An *epistemic structure* K = (W, R, L) over (P, A) is given by a set of *worlds* W, an A-family of epistemic *accessibility relations* R = (R<sup>a</sup> ⊆ W × W)a∈A, and a *labelling* L: W → ℘P assigning each world a set of propositions. In concrete examples, we will require R<sup>a</sup> to be an equivalence relation such that if (w1, w2) ∈ Ra, then agent a cannot distinguish between the two worlds w<sup>1</sup> and w2. The *epistemic formulæ* ϕ ∈ ΦP,A over (P, A) are defined by the following grammar:

ϕ ::= p | false | ¬ϕ | ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> | K<sup>a</sup> ϕ

where p ∈ P and a ∈ A. The epistemic formula K<sup>a</sup> ϕ is to be read as "agent a *knows* ϕ". We use the usual propositional abbreviationstrue for¬false andϕ1∨ϕ<sup>2</sup> for¬(¬ϕ1∧¬ϕ2). Furthermore, we consider the epistemic modality M as the dual of K, such that M<sup>a</sup> ϕ abbreviates ¬K<sup>a</sup> ¬ϕ and is to be read as "agent a *deems* ϕ *possible*". The *satisfaction relation* of an epistemic formula ϕ ∈ ΦP,A over an epistemic structure K = (W, R, L) over (P, A) at a world w ∈ W, written K, w |= ϕ, is inductively defined by

$$\begin{aligned} K, w &= p \iff p \in L(w) \\ K, w &\not\le \text{false} \\ K, w &= \neg \varphi \iff K, w \not\le \varphi \\ K, w &= \varphi\_1 \land \varphi\_2 \iff K, w \models \varphi\_1 \text{ and } K, w \models \varphi\_2 \\ K, w &= \mathsf{K}\_a \varphi \iff K, w' \models \varphi \text{ f.a. } w' \in W \text{ with } (w, w') \in \mathsf{R}\_a \end{aligned}$$

#### **2.2 Epistemic Transition Structures**

An epistemic transition structure combines a temporal transition relation with an epistemic accessibility relation over a common set of states. The transitions describe which states can be reached from a set of initial states, the accessibilities specify which states are indistinguishable. Knowledge formulæ are evaluated over the associated global epistemic structure. This derived structure has the reachable states as its worlds and reuses the accessibility relation and the labelling but restricted to the reachable states.

Formally, an *epistemic transition structure* M = (S, E, L, S0, T) over (P, A) is given by an epistemic structure (S, E, L), a set of temporally *initial states* S<sup>0</sup> ⊆ S, and a temporal *transition relation* T ⊆ S × S. We write S(M) for S, T(M) for T, etc. The (temporally) *reachable states* Sω(M) = S <sup>0</sup>≤<sup>k</sup> <sup>S</sup>k(M) and *transition relation* Tω(M) = S 0≤k Tk(M) of M are inductively defined by

$$\begin{aligned} S\_0(M) &= S\_0, \quad S\_{k+1}(M) = S\_k(M) \cup \{ s' \mid \text{ex. } s \in S\_k(M) \text{ s.t. } (s, s') \in T \} \vdots \\ T\_0(M) &= \emptyset, \quad T\_{k+1}(M) = T\_k(M) \cup \{ (s, s') \in T \mid s \in S\_k(M) \} \ . \end{aligned}$$

The associated *epistemic structure* of M is given by

$$K(M) = (S\_{\omega}(M), E \cap S\_{\omega}(M)^2, L \upharpoonright S\_{\omega}(M))$$

where Sω(M) 2 abbreviates Sω(M)×Sω(M) and LSω(M) denotes labelling L restricted to domain Sω(M). The *satisfaction relation* of an epistemic formula ϕ ∈ ΦP,A over M at an s ∈ Sω(M), written M, s |= ϕ, is defined as

$$M, s \mid = \varphi \iff K(M), s \mid = \varphi \;. $$

The set of epistemic transition structures over Σ = (P, A) sharing the same *epistemic state basis* B = (S, E, L, S0) is denoted by MΣ(B). We say that M<sup>1</sup> ⊆ M<sup>2</sup> for M1, M<sup>2</sup> ∈ MΣ(B) if T(M1) ⊆ T(M2) and similarly extend union and intersection from transition relations to epistemic transition structures.

# **3 Knowledge-based Programs**

Knowledge-based programs extend standard programs by explicit knowledge tests. Their interpretation involves a cycle: the evaluation of the epistemic guards depends on the program's reachable states, the derivation of the reachable states on the evaluation of the program's epistemic guards.

We render knowledge-based programs in a syntax-agnostic format as epistemically guarded transition systems. Like epistemic transition structures, these guarded systems operate on a global set of states with epistemic accessibilities and a propositional labelling. All program steps are represented as knowledge-guarded actions of the form ϕ ⊃ B with ϕ an epistemic formula and B a relation on the semantic states. Knowledge-independent decisions are obtained by choosing ϕ = true, and any kind of program control structure can be expressed by a judicious choice of guarded actions.

Breaking up the cyclic step of assigning meaning to a knowledge-based program, an epistemically guarded transition system Γ is interpreted over an epistemic transition structure M yielding another epistemic transition structure Γ <sup>M</sup>. A guarded action ϕ ⊃ B of Γ contributes those (s, s<sup>0</sup> ) ∈ B for which M, s |= ϕ, where, in particular, s is reachable in M. What is sought for is a consistent interpretation with Γ <sup>M</sup> = M such that reachability and knowledge are mutually justified. Finding such a balanced structure is complicated by the fact that the interpretation functional is not monotone in general: The more is reachable the less is known and this may make more or less states reachable.

After introducing and illustrating our format of knowledge-based programs we summarise and adapt two existing approaches to their interpretation that have been devised for run-based rather than state-based systems: De Haan et al. [10] propose to iterate the interpretation functional starting from an epistemic transition structure where all states are reachable. Iteration stops when either a fixed point is reached or, due to non-monotonicity, a contradiction is found. In this way all knowledge-based programs are assigned some semantics and there is no distinction between meaningful and contradictory or just selffulfilling programs. The original approach by Fagin et al. [13,14] characterises knowledgebased programs that admit a unique consistent interpretation by the notion of dependence on the past. A sufficient condition of providing epistemic witnesses is developed which, in particular, applies to the subclass of synchronous knowledge-based programs.

#### **3.1 Epistemically Guarded Transition Systems**

An *epistemically guarded transition system* Γ = (S, E, L, S0, T ) over (P, A) is given by an epistemic state basis (S, E, L, S0) over (P, A) and a set T of *epistemically guarded actions* ϕ ⊃ B consisting of an epistemic formula ϕ ∈ ΦP,A as *guard* and a transition relation B ⊆ S × S.

*Example 1.* (a) Consider the bit transmission problem of the introduction:

do <sup>¬</sup>K<sup>S</sup> <sup>K</sup><sup>R</sup> sbit \_ (rval <sup>←</sup> sbit or skip) <sup>8</sup> <sup>K</sup><sup>R</sup> sbit ∧ ¬K<sup>R</sup> <sup>K</sup><sup>S</sup> <sup>K</sup><sup>R</sup> sbit \_ (ack <sup>←</sup> <sup>1</sup> or skip) od

A sender agent S sends a bit sbit ∈ {0, 1} to a receiver agent R over an unreliable channel by setting rval ∈ {⊥, 0, 1}; and R acknowledges the reception over an unreliable channel by setting ack ∈ {0, 1}. Again, we abbreviate (K<sup>R</sup> ¬sbit) ∨ (K<sup>R</sup> sbit) expressing that

the receiver knows the bit to be sent by K<sup>R</sup> sbit. We concretise the problem into an epistemically guarded transition system Γbt = (Bbt, Tbt) with Bbt = (Sbt, Ebt, Lbt, Sbt,0) over Σbt = (Pbt, Abt) with Pbt = {sbit,rbit,snt, ack} and Abt = {S, R}. Since we use a propositional encoding, we represent rval ∈ {⊥, 0, 1} by a proposition rbit for the transmitted bit and a proposition snt for the validity of rbit. Further abbreviating the knowledge guards K<sup>R</sup> sbit by k<sup>r</sup> , K<sup>S</sup> K<sup>R</sup> sbit by ksr , and K<sup>R</sup> K<sup>S</sup> K<sup>R</sup> sbit by krsr , the transition system Γbt is graphically given by

The states Sbt comprise of {z0, z1, . . . , z7} with Lbt(z0) = ∅, Lbt(z1) = {snt}, . . . , Lbt(z7) = {sbit,rbit,snt, ack} as outlined in the graph above; the set of initial states is Sbt,<sup>0</sup> = {z0, z4}. The epistemic accessibility relations Ebt,a for a ∈ Abt are given by *observability sets* Obt,a that declare two states s1, s<sup>2</sup> ∈ Sbt to be Obt,a*-indistinguishable*, written as s<sup>1</sup> ∼Obt,a s2, if for all p ∈ Obt,a it holds that p ∈ Lbt(s1) ⇐⇒ p ∈ Lbt(s2), and consequently Ebt,a = ∼Obt,a , such that Ebt,a forms an equivalence relation. Due to sbit ∈/ Obt,R, the receiver R cannot "see" sbit and hence cannot distinguish between states z<sup>0</sup> and z4, but S can. On the other hand, R can distinguish between z<sup>1</sup> and z<sup>5</sup> as R has access to rbit. Finally, Tbt consists of two epistemically guarded actions

$$\begin{array}{c} \neg \mathsf{K}\_{\mathsf{S}} \mathsf{K}\_{\mathsf{R}} \; shift \supset \left\{ (\mathbf{z}\_{i}, \mathbf{z}\_{i}) \mid 0 \le i \le 7 \right\} \cup \left\{ (\mathbf{z}\_{0}, \mathbf{z}\_{1}), (\mathbf{z}\_{2}, \mathbf{z}\_{3}), (\mathbf{z}\_{4}, \mathbf{z}\_{5}), (\mathbf{z}\_{6}, \mathbf{z}\_{7}) \right\} \; \text{and} \\\mathsf{K}\_{\mathsf{R}} \; shift \land \neg \mathsf{K}\_{\mathsf{R}} \; \mathsf{K}\_{\mathsf{S}} \; \mathsf{K}\_{\mathsf{R}} \; shift \supset \left\{ (\mathbf{z}\_{i}, \mathbf{z}\_{i}) \mid 0 \le i \le 7 \right\} \cup \\\ \left\{ (\mathbf{z}\_{0}, \mathbf{z}\_{2}), (\mathbf{z}\_{1}, \mathbf{z}\_{3}), (\mathbf{z}\_{4}, \mathbf{z}\_{6}), (\mathbf{z}\_{5}, \mathbf{z}\_{7}) \right\} \; \text{and} \end{array}$$

which directly reflect the sending and acknowledging actions of the bit transmission problem: The system can only advance from z<sup>0</sup> to z<sup>1</sup> (and z<sup>4</sup> to z5), where sending has been done successfully, if S does not know that R knows the bit; but it need not make such progress, i.e., sending can be unsuccessful. Similarly, the system can only advance from z<sup>1</sup> to z<sup>3</sup> (and z<sup>5</sup> to z7), where an acknowledgement has been sent successfully, if R knows the bit and R does not know that S knows that R knows the bit.

(b) Consider the variable setting problem of the introduction for a single agent a:

$$\begin{array}{c} \mathtt{if} \,\mathsf{K}\_{\mathtt{a}}\,\mathsf{x} \neq 1 \rightarrow \mathtt{x} \leftarrow 3\\ \mathtt{[}\,\mathsf{K}\_{\mathtt{a}}\,\mathsf{x} \neq 3 \rightarrow \mathtt{x} \leftarrow 1 \,\mathsf{fi} \end{array}$$

Encoding the integer x ∈ {0, 1, 2, 3} by two bits q<sup>1</sup> and q2, we model the problem as the following epistemically guarded transition system Γvs = (Bvs , Tvs ) with Bvs = (Svs , Evs , Lvs , Svs,0) over Σvs = (Pvs , Avs ) with Pvs = {q1, q2} and Avs = {a}:

$$\underbrace{\begin{aligned} \mathsf{K}\_{\mathsf{a}}\dashv(\mathsf{q}\_{1}\wedge\neg\mathsf{q}\_{2})?}\_{\mathsf{x}\_{1}=\mathsf{a}}&O\_{\mathsf{a}\simeq,\mathsf{a}}=\emptyset \overbrace{\overbrace{\mathsf{K}\_{\mathsf{a}},\neg\mathsf{q}\_{2}}^{\neg\mathsf{q}\_{1},\neg\mathsf{q}\_{2}}}^{\mathsf{x}\_{1}}^{\mathsf{x}\_{2}}&\times=0\\ &\underbrace{\mathsf{K}\_{\mathsf{a}},\neg\mathsf{q}\_{2}}\_{\mathsf{x}\_{1}=\mathsf{a}}&O\_{\mathsf{a}}=\emptyset \end{aligned}}\_{\mathsf{x}=1}&\underbrace{\begin{aligned} \mathsf{K}\_{\mathsf{a}}\dashv(\neg\mathsf{q}\_{1}\wedge\mathsf{q}\_{2})?\\ \mathsf{K}\_{\mathsf{a}}=0 \end{aligned}}\_{\mathsf{x}=2}$$

Ovs,<sup>a</sup> represents a "blind" agent a that deems all states equally accessible. State s<sup>3</sup> is definitely not reachable. Tvs consists of the epistemically guarded actions

$$\mathsf{K}\_{\mathsf{a}} \neg (\mathsf{q}\_{1} \land \neg \mathsf{q}\_{2}) \supset \{ (\mathsf{s}\_{0}, \mathsf{s}\_{1}) \} \quad \text{and} \quad \mathsf{K}\_{\mathsf{a}} \neg (\neg \mathsf{q}\_{1} \land \mathsf{q}\_{2}) \supset \{ (\mathsf{s}\_{0}, \mathsf{s}\_{2}) \} \,. \tag{7.10}$$

#### **3.2 Interpreting Epistemically Guarded Transition Systems**

An epistemically guarded transition system Γ = (S, E, L, S0, T ) over (P, A) is *interpreted* over an epistemic transition structure M ∈ MP,A(S, E, L, S0) by interpreting each guarded action (ϕ ⊃ B) ∈ T w. r.t. M as

$$\{ (\varphi \supset B)^M = \{ (s, s') \in B \mid s \in S\_\omega(M) \text{ and } M, s \models \varphi \} \text{ , }$$

and combining these interpretations into the epistemic transition structure

$$
\Gamma^M = (S, E, L, S\_0, \bigcup\_{\tau \in \mathcal{T}} \tau^M) \dots
$$

We call M a *solution* for Γ if Γ <sup>M</sup> = M.

*Example 2.* For the bit transmission problem as described in Ex. 1(a), the epistemic transition structure Mbt = (Bbt, Tbt) with Tbt = {(z<sup>i</sup> , zi) | i ∈ {0, 1, 3, 4, 5, 7}} ∪ {(z0, z1),(z1, z3),(z4, z5),(z5, z7)} satisfies Γbt <sup>M</sup>bt = Mbt. This structure just omits the states z<sup>2</sup> and z<sup>6</sup> with Lbt(z2) = {ack} and Lbt(z6) = {sbit, ack} which are definitely not reachable, as K<sup>R</sup> sbit is false in z<sup>0</sup> ∼Obt,<sup>R</sup> z4. Indeed,

$$\begin{aligned} M\_{bt}, s &= \neg \mathsf{K}\_{\mathsf{S}} \mathsf{K}\_{\mathsf{R}} \, sbit \iff s \in \{\mathsf{z}\_{0}, \mathsf{z}\_{1}, \mathsf{z}\_{4}, \mathsf{z}\_{5}\} \\ M\_{bt}, s &= \mathsf{K}\_{\mathsf{R}} \, sbit \iff s \in \{\mathsf{z}\_{1}, \mathsf{z}\_{3}, \mathsf{z}\_{5}, \mathsf{z}\_{7}\} \\ M\_{bt}, s &= \neg \mathsf{K}\_{\mathsf{R}} \, \mathsf{K}\_{\mathsf{S}} \, \mathsf{K}\_{\mathsf{R}} \, sbit \iff s \in \{\mathsf{z}\_{0}, \mathsf{z}\_{1}, \mathsf{z}\_{3}, \mathsf{z}\_{4}, \mathsf{z}\_{5}, \mathsf{z}\_{7}\} \end{aligned} \qquad \begin{aligned} M\_{bt}, s &\in \mathsf{M}\_{\mathsf{S}} \\ M\_{bt}, s &\in \mathsf{M}\_{\mathsf{S}} \, \mathsf{K}\_{\mathsf{R}} \, s \, s \in \mathsf{S} \, \mathsf{R}\_{\mathsf{S}} \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \, s \} \end{aligned}$$

However, finding a solution is complicated by the fact that the functional of interpreting an epistemically guarded transition system over an epistemic transition structure is not monotone, in general, as illustrated by the following examples.

*Example 3.* (a) Continuing Ex. 1(b) for the variable setting problem Γvs , consider the epistemic transition structure Mvs,<sup>0</sup> ∈ M<sup>Σ</sup>vs (Bvs ) with the empty transition relation T(Mvs,0) = ∅, and hence S0(Mvs,0) = {s0}. Setting Mvs,i+1 = Γvs <sup>M</sup>vs,i for 0 ≤ i ≤ 2 we obtain successively


In particular, Mvs,<sup>2</sup> = Γvs <sup>M</sup>vs,<sup>1</sup> = Γvs Γvs Mvs,0 = Mvs,0. However, for Mvs,4, Mvs,<sup>5</sup> ∈ MΣvs (Bvs ) with T(Mvs,4) = {(s0,s1)} and T(Mvs,5) = {(s0,s2)} we obtain that Γvs <sup>M</sup>vs,<sup>4</sup> = Mvs,<sup>4</sup> and Γvs <sup>M</sup>vs,<sup>5</sup> = Mvs,5.

(b) For capturing the cycle-breaking variable setting of the introduction consider the following epistemically guarded transition system Γvsb = (Bvs , Tvsb) over Σvs that shares the epistemic state basis Bvs with Ex. 1(b):

$$\underbrace{\begin{aligned} \mathsf{K}\_{\mathsf{a}}\neg(\mathsf{q}\_{1}\wedge\neg\mathsf{q}\_{2}) \mathrel{\mathop{\mathsf{s}}^{\mathsf{a}}}\_{\times=3} \end{aligned}}\_{\times\times\cdots\times\begin{aligned} \mathsf{O}\_{\mathsf{v}s,a} = \emptyset \xleftarrow[\overbrace{\mathsf{q}\_{1},\neg\mathsf{q}\_{2}}] \xleftarrow[\overbrace{\mathsf{q}\_{1},\neg\mathsf{q}\_{2}}] \end{aligned}}\_{\times\times=1} \begin{aligned} \mathsf{K}\_{\mathsf{a}}\neg(\neg\mathsf{q}\_{1}\wedge\mathsf{q}\_{2}) \mathrel{\mathop{\mathsf{s}}^{\mathsf{a}}}\_{\times=2} \end{aligned}}$$

For Mvsb,<sup>0</sup> = (Bvs , ∅) with S0(Mvsb,0) = {s0}, and setting Mvsb,i+1 = Γvsb <sup>M</sup>vsb,i for 0 ≤ i ≤ 3 we obtain successively


For Mvsb,<sup>3</sup> with Sω(Mvsb,3) = {s0,s1,s3} it finally holds that Γvsb <sup>M</sup>vsb,<sup>3</sup> = Mvsb,3.

#### **3.3 Iteration Semantics**

For illustrating the non-monotonicity of the interpretation functional we have started the interpretation sequence for Γ with the smallest epistemic transition structure which suggests to look for a smallest fixed point — which need not exist. De Haan et al. [10] argue that a substitute consisting of the greatest fixed point would be more liberal. They construct a transfinite approximation sequence starting from an N<sup>0</sup> having all states reachable. For a successor ordinal α+1, the approximation Nα+1 is just the interpretation of Γ in Nα; for a limit ordinal λ, the approximation N<sup>λ</sup> = T α<λ S <sup>α</sup>≤β<λ <sup>N</sup><sup>β</sup> is "the intersection of unions of approximations that are sufficiently close to the limit" [10, p. 269]. The latter is preferred over a union of intersections as it includes more states which implies less knowledge, such that "agents [know] facts only when there are good reasons for them" (ibid.). Due to cardinality reasons, the ordinal η<sup>Γ</sup> = inf{α | ex. β s. t. α < β and N<sup>α</sup> = Nβ} exists. If Nα+1 ⊆ N<sup>α</sup> for all α ≥ η<sup>Γ</sup> , then N<sup>η</sup><sup>Γ</sup> +1 = N<sup>η</sup><sup>Γ</sup> ; otherwise there is some α ≥ η<sup>Γ</sup> such that Nα+1 6⊆ Nα. Thus α<sup>Γ</sup> = inf{α | η<sup>Γ</sup> ≤ α and (N<sup>α</sup> = Nα+1 or Nα+1 6⊆ Nα)} exists and the *iteration semantics* of Γ is defined as N<sup>α</sup><sup>Γ</sup> . This yields the greatest fixed point if the interpretation functional is monotone.

*Example 4.* (a) For the variable setting problem Γvs of Ex. 1(b) the interpretation sequence (Nvs,α)0≤<sup>α</sup> starts with Nvs,<sup>0</sup> showing T(Nvs,0) = Svs × Svs . Using the epistemic transition structures from Ex. 3(a) it holds that Nvs,k+1 = Γvs <sup>N</sup>vs,k = Mvs,<sup>2</sup> for k even and Nvs,k+1 = Mvs,<sup>1</sup> for k ≥ 1 odd. Thus, Nvs,<sup>1</sup> = Nvs,<sup>3</sup> such that η<sup>Γ</sup>vs = 1 = α<sup>Γ</sup>vs , since T(Nvs,2) = {(s0,s0),(s0,s1),(s0,s2)} 6⊆ ∅ = T(Nvs,1). Hence the iteration semantics of Γvs is given by Nvs,<sup>1</sup> = Mvs,2; since its transition relation is empty, Γvs has the same iteration semantics as an epistemically guarded transition system without any guarded actions.

(b) Computing the iteration semantics sequence (Nvsb,α)0≤<sup>k</sup> of the cycle-breaking variable setting Γvsb of Ex. 3(b) proceeds as Nvsb,k = Mvsb,k+1. Since this time the functional is monotone from α = 1 onwards, the iteration semantics is Nvsb,2.

(c) Consider the following epistemically guarded transition system Γnc = (Bvs , Tnc) over Σvs that shares the epistemic basis Bvs with the variable setting problem Γvs of (a) and only adds the guarded action K<sup>a</sup> ¬q<sup>2</sup> ⊃ {(s0,s3)}:

The interpretation process runs as for Γvs , and the epistemic transition structure with the empty transition relation is also the iteration semantics of Γnc. This time, however, there is a unique non-empty interpretation, viz. the transition structure consisting only of (s0,s1). Finding this solution is not constructive and some speculation is necessary: there is no solution where s<sup>2</sup> is reachable; if s<sup>2</sup> were reachable, then s<sup>1</sup> would be reachable leading to a contradiction due to the (non-)reachability of s3. Thus only the possibility of s<sup>0</sup> and s<sup>1</sup> being reachable, and s<sup>2</sup> and s<sup>3</sup> unreachable, remains.

(d) For the epistemically guarded transition system Γmay over ({p}, {a}) given by

$$O\_{m\_{\rm{day},a}} = \emptyset \dashrightarrow \overbrace{\rightharpoonup p}^{\mathbf{u}\_0} \varprojlim^{\mathbf{M}\_{\rm{a}}} \varprojlim^{\mathbf{u}\_1}$$

the iteration process when started with Nmay,<sup>0</sup> having T(Nmay,0) = {u0, u1}×{u0, u1} evaluates M<sup>a</sup> p to true and we obtain Nmay,<sup>1</sup> with T(Nmay,1) = {(u0, u1)} which in turn is confirmed by the next iteration yielding a fixed point. This iteration semantics, however, has a touch of a "vaticinium ex eventu": p can be reached since p may be reached.

# **3.4 Unique Interpretation Solutions**

A knowledge-based program can be executed reliably just step by step if each knowledge guard can be stably decided based on what has been computed up to the current point of execution. In particular, in order to obtain a solution by execution, knowledge must not be invalidated by information only to be gained later on. Conversely, if all knowledge guards can be decided by just looking to the past, there is at most a single solution.

Based on this observation, Fagin et al. [13,14] develop a formal characterisation of unique interpretability by capturing the notion that solutions "depend on the past". They then show that "providing epistemic witnesses" is a sufficient criterion for "dependence on the past", which in turn always holds for "synchronous" programs. We briefly summarise their main line of argument adapting the demonstration from their run-based account for knowledge-based programs to our state-based epistemically guarded transition systems.3

<sup>3</sup> The proofs are available in a long version at https://arxiv.org/abs/2301.10807.

An epistemic formula ϕ ∈ ΦP,A is said to *depend on the past* w. r.t. a class of epistemic transition structures M ⊆ MP,A(B) if for all M1, M<sup>2</sup> ∈ M and all k ∈ N it holds that Tk(M1) = Tk(M2)implies M1, s |= ϕ ⇐⇒ M2, s |= ϕ for all s ∈ Sk(M1)∩Sk(M2); an epistemically guarded transition system Γ = (B, T ) over (P, A) is *depending on the past* w. r.t. M if every ϕ in (ϕ ⊃ B) ∈ T depends on the past w. r.t. M.

*Example 5.* For Ex. 3(a) neither K<sup>a</sup> ¬(q1∧¬q2) nor K<sup>a</sup> ¬(¬q1∧q2) depends on the past w. r.t. {Mvs,0, Mvs,1}. In particular, T0(Mvs,0) = ∅ = T0(Mvs,1) and S0(Mvs,0) = {s0} = S0(Mvs,1), but Mvs,0,s<sup>0</sup> |= K<sup>a</sup> ¬(q<sup>1</sup> ∧ ¬q2) and Mvs,1,s<sup>0</sup> 6|= K<sup>a</sup> ¬(q<sup>1</sup> ∧ ¬q2). Similarly for Ex. 3(b), these two formulæ do not depend on the past w. r.t. {Mvsb,0, Mvsb,1, Mvsb,2, Mvsb,3}, but they do w. r.t. {Mvsb,1, Mvsb,2, Mvsb,3}.

An epistemically guarded transition system Γ has at most one solution if, and only if, it depends on the past w. r.t. all its solutions. Due to the dependence on the past the successive reachable transition relations Tk(M) of all solutions M = Γ <sup>M</sup>, i.e., their pasts, coincide.

**Proposition 1.** *Let* Γ = (B, T ) *be an epistemically guarded transition system over* Σ*. Then* Γ *has at most one solution if, and only if, there is an* M ⊆ MΣ(B) *with* {M ∈ MΣ(B) | Γ <sup>M</sup> = M} ⊆ M *such that* Γ *depends on the past w. r.t.* M*.*

In order to obtain a solution of Γ by execution, the system is interpreted repeatedly to construct the approximations (Mk)0≤<sup>k</sup> with Mk+1 = Γ <sup>M</sup><sup>k</sup> for k ≥ −1 starting with some M<sup>−</sup>1. Each approximation M<sup>k</sup> with k ≥ 0 contributes a transition relation Tk(Mk) which can be combined into a limit Mω. If Γ depends on the past w. r.t. the class of epistemic transition structures from which the approximands are constructed and which also contains the limit, then the interpretation of the limit M<sup>ω</sup> yields a fixed point.

**Proposition 2.** *Let* Γ = (B, T ) *be an epistemically guarded transition system over* Σ*, let* M ⊆ MΣ(B) *such that* Γ <sup>M</sup> ∈ M *for every* M ∈ M *and* (B, S 0≤k Tk(Mk)) ∈ M *for all* (Mk)0≤<sup>k</sup> ⊆ M *with* Tk(Mk<sup>0</sup> ) = Tk(Mk) *for all* k <sup>0</sup> ≥ k ≥ 0*, and let* Γ *depend on the past w. r.t.* M*. Let* M<sup>−</sup><sup>1</sup> ∈ M*,* Mi+1 = Γ <sup>M</sup><sup>i</sup> *for all* i ≥ −1*, and* M<sup>ω</sup> = (B, S 0≤k Tk(Mk))*. Then* Γ <sup>M</sup><sup>ω</sup> = Γ Γ Mω *.*

A sufficient criterion for obtaining a comprehensive class of epistemic transition structuresMsuch that Γ depends on the past w. r.t.Mis provided by epistemic witnesses: If some knowledge formula K<sup>a</sup> ϕ of Γ does not hold at some state of an interpreting epistemic transition structure there is evidence in the past of this structure why it does not hold. Formally, a structure M ∈ MP,A(B) *provides epistemic witnesses* for a formula K<sup>a</sup> ϕ ∈ ΦP,A if for all k ≥ 0, s ∈ Sk(M) it holds that if M, s 6|= K<sup>a</sup> ϕ, then there is an s <sup>0</sup> ∈ Sk(M) with (s, s<sup>0</sup> ) ∈ E<sup>a</sup> and M, s<sup>0</sup> 6|= ϕ.

**Lemma 1.** *Let* Γ = (B, T ) *be an epistemically guarded transition system over* Σ *and let* M ⊆ MΣ(B) *such that all* M ∈ M *provide epistemic witnesses for all knowledge guards in* Γ*. Then* Γ *is depending on the past w. r.t.* M*.*

A sufficient criterion, in turn, for a structure M ∈ MP,A(S, E, L, S0) to provide epistemic witnesses is M being *synchronous*: if for all a ∈ A and all reachable s<sup>1</sup> ∈ Sk<sup>1</sup> (M) and s<sup>2</sup> ∈ Sk<sup>2</sup> (M) with (s1, s2) ∈ E<sup>a</sup> it holds that s1, s<sup>2</sup> ∈ Smin{k1,k2}(M). In a synchronous structure the temporal and the epistemic dimension for each agent are hence tightly coupled and agents cannot access the future, but also do not need to know the future.

*Example 6.* The interpretation Mbt of the bit transmission problem given in Ex. 2 provides epistemic witnesses, but is not synchronous: the sender S cannot distinguish z<sup>0</sup> reachable at depth 0 of Mbt from z<sup>1</sup> that is only reachable at depth 1, and similarly the receiver R cannot distinguish z<sup>1</sup> from z<sup>3</sup> at the respective depths of 1 and 2.

An epistemically guarded transition system Γ = (B, T ) over Σ *provides epistemic witnesses* if for each M ∈ MΣ(B) the interpretation Γ <sup>M</sup> provides epistemic witnesses for all knowledge formulæ occurring in some of the action guards of Γ; Γ is *synchronous* if each Γ <sup>M</sup> is synchronous. Moreover, Γ can syntactically be seen to be synchronous (cf. [14, p. 135]) if it is round-based where all agents perform some action in each round and record locally which actions they have taken.

# **4 (Re-)Interpreting Knowledge-based Programs**

The results by Fagin et al. [13,14] guarantee a unique interpretation for all synchronous knowledge-based programs; the approach by De Haan et al. [10] aims at extending the interpretation to asynchronous programs, but assigns semantics also to contradictory or self-fulfilling programs.

The necessity of avoiding contradictory or self-fulfilling behaviour already occurs in the design of synchronous programming languages [6]: Their underlying principle is "perfect synchrony", that any reaction of a program takes zero time and that thus whatever is output in reaction to some input is already present at the same time as the input. Since the presence or absence of signals can be tested, this requires "logical coherence" saying that a (non-input) signal is present in a reaction if, and only if, this signal is emitted in this very reaction. A program needs to be both *reactive* in the sense of leading to some logically coherent signal status, and *determinate*, i.e., not showing several such statuses. For example, in Esterel [7], the program fragment

```
present S then nothing else emit S end
```
is not reactive, but contradictory: signal S is only emitted if it is not emitted; and

#### present S then emit S else nothing end

is not determinate, but self-fulfilling: S is emitted if it is emitted, and it is not emitted if it is not. Such programs can be revealed by using a cycle-detecting static analysis, as is done in Lustre [18], or, for including more intricate cases, by Berry's "constructive semantics" as for Esterel [8]. Building on a "logical semantics" recording what is emitted in each step of execution, a *must*/*cannot* analysis is performed: what must/cannot be emitted, which branch must/cannot be executed. It is then required that for each signal it can be decided whether it must be present or it cannot be present. For example, in the parallel execution

```
[ present S1 then emit S1 end ]
```
both signals can be emitted — if S1 is assumed to be present, and S2 absent —, but none must be emitted. Thus the constructive semantics does not reach a decision of what must/cannot be present and the program is not constructive. Intriguingly, however, there is exactly one coherent signal status that can be reached by execution: S1 and S2 absent.

We adapt Berry's constructive semantics approach to knowledge-based programs. In fact, the first, non-reactive Esterel program fragment resembles the variable setting problem described in Ex. 3(a), the second, non-determinate fragment directly corresponds to Ex. 4(d), and the last, combined fragment is essentially the same as Ex. 4(c). We first define a must/can version of epistemic transition structures with a lower (must) and an upper bound (can). Based on a positive (must) and negative (cannot) satisfaction relation of epistemic formulæ over these structures we show how an epistemically guarded transition system can be interpreted yielding another epistemic must/can transition structure. For uniformity, we rephrase this interpretation in terms of the negation normal form of formulæ and demonstrate that the constructive interpretation is always monotone and leads to a least fixed point. For any knowledge-based program, this fixed point soundly shows which executions are necessary and which are possible. However, the fixed point need not be decided, and more can be possible than is necessary. We show that synchronous programs always lead to decided fixed points.

# **4.1 Epistemic Must/Can Transition Structures**

An *epistemic must/can transition structure* Y = (S, E, L, S0,(Tµ, Tν)) over Σ = (P, A) is given by an epistemic state basis B = (S, E, L, S0) and two *lower* and *upper transition relations* Tµ, T<sup>ν</sup> ⊆ S × S with T<sup>µ</sup> ⊆ Tν. In particular, Y<sup>µ</sup> = (B, Tµ) and Y<sup>ν</sup> = (B, Tν) are epistemic transition structures over Σ with Y<sup>µ</sup> ⊆ Yν.

The *positive* and *negative satisfaction relations* of an epistemic formula ϕ ∈ ΦP,A over the epistemic must/can transition structure Y at a state s ∈ Sω(Yν), written Y, s |=<sup>p</sup> ϕ and Y, s |=<sup>n</sup> ϕ, are defined as follows:

Y, s |=<sup>p</sup> p ⇐⇒ p ∈ L(s) Y, s |=<sup>n</sup> p ⇐⇒ p /∈ L(s) Y, s 6|=<sup>p</sup> false Y, s |=<sup>n</sup> false Y, s |=<sup>p</sup> ¬ϕ ⇐⇒ Y, s |=<sup>n</sup> ϕ Y, s |=<sup>n</sup> ¬ϕ ⇐⇒ Y, s |=<sup>p</sup> ϕ Y, s |=<sup>p</sup> ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> ⇐⇒ Y, s |=<sup>n</sup> ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> ⇐⇒ Y, s |=<sup>p</sup> ϕ<sup>1</sup> and Y, s |=<sup>p</sup> ϕ<sup>2</sup> Y, s |=<sup>n</sup> ϕ<sup>1</sup> or Y, s |=<sup>n</sup> ϕ<sup>2</sup> Y, s |=<sup>p</sup> K<sup>a</sup> ϕ ⇐⇒ Y, s<sup>0</sup> |=<sup>p</sup> ϕ Y, s |=<sup>n</sup> K<sup>a</sup> ϕ ⇐⇒ Y, s<sup>0</sup> |=<sup>n</sup> ϕ for all s <sup>0</sup> ∈ Sω(Yν) with (s, s<sup>0</sup> ) ∈ E<sup>a</sup> for some s <sup>0</sup> ∈ Sω(Yµ) with (s, s<sup>0</sup> ) ∈ E<sup>a</sup>

A formula is positively satisfied over Y if it must be true given the upper bound Y<sup>ν</sup> of possible behaviour, it is negatively satisfied if it cannot be true given the lower bound Y<sup>µ</sup> of necessary behaviour. In fact, it holds that what must be true can also be true:4

**Lemma 2.** *Let* Y = (S, E, L, S0,(Tµ, Tν)) *be an epistemic must/can transition structure over* (P, A) *and* ϕ ∈ ΦP,A*. Then for all* s ∈ Sω(Yν)*,* Y, s |=<sup>p</sup> ϕ *implies* Y, s 6|=<sup>n</sup> ϕ*.*

<sup>4</sup> The proofs are available in a long version at https://arxiv.org/abs/2301.10807.

The set of epistemic must/can transition structures over Σ and the epistemic state basis B is denoted by YΣ(B). We say that Y<sup>1</sup> v Y<sup>2</sup> for Y1, Y<sup>2</sup> ∈ YΣ(B) if Y1,µ ⊆ Y2,µ and Y1,ν ⊇ Y2,ν: an *extension* raises the lower bound and reduces the upper bound.

As with epistemic transition structures, an epistemically guarded transition system Γ = (S, E, L, S0, T ) over(P, A) can be interpreted over an epistemic must/can transition structure Y ∈ YP,A(S, E, L, S0): The *interpretation* of a guarded action (ϕ ⊃ B) ∈ T w. r.t. to Y is given by the pair (ϕ ⊃ B) <sup>Y</sup> = ((ϕ ⊃ B) Y,µ ,(ϕ ⊃ B) Y,ν) with

(ϕ ⊃ B) Y,µ = {(s, s<sup>0</sup> ) ∈ B | s ∈ Sω(Yµ) and Y, s |=<sup>p</sup> ϕ} , (ϕ ⊃ B) Y,ν = {(s, s<sup>0</sup> ) ∈ B | s ∈ Sω(Yν) and Y, s 6|=<sup>n</sup> ϕ} .

By Lem. 2 it holds that τ Y,µ ⊆ τ Y,ν for each τ ∈ T . The *constructive interpretation* of Γ w. r.t. Y is given by the epistemic must/can transition structure

$$I^Y = (S, E, L, S\_0, (\bigcup\_{\tau \in \mathcal{T}} \tau^{Y, \mu}, \bigcup\_{\tau \in \mathcal{T}} \tau^{Y, \nu})) \dots$$

This is well defined, i.e., (Γ Y )<sup>µ</sup> ⊆ (Γ Y )ν. We call Y a *constructive solution* for Γ if Γ <sup>Y</sup> = Y ; a constructive solution is *decided* if Y<sup>µ</sup> = Yν.

Again as with epistemic transition structures, this interpretation over epistemic must/can transition structures can be iterated for finally reaching a stable structure — and this time interpretation turns out to be monotone.

*Example 7.* (a) Re-consider the cycle-breaking variable setting problem of Ex. 3(b). We start the interpretation in Yvsb,<sup>0</sup> = (Bvs ,(∅, S<sup>2</sup> vs )) and successively obtain the following epistemic must/can transition structures:


Not only does it hold that Γvsb <sup>Y</sup>vsb,<sup>3</sup> = Yvsb,3, but the interpretations indeed evolve monotonically w. r.t. v. Moreover, the structure Yvsb,<sup>3</sup> is decided and everything what can happen also must happen, i.e., (Yvsb,3)<sup>µ</sup> = (Yvsb,3)ν.

(b) For the cyclic variable setting problem, see Ex. 1(b) and Ex. 3(a), the interpretation process is monotone, but only yields


The epistemic must/can transition structure Yvs,<sup>1</sup> is not decided, and indeed there are two solutions of Γvs in terms of epistemic transition structures. However, the same undecidedness holds true for Γnc of Ex. 4(c), that is, the unique solution is also missed by the constructive interpretation.

#### **4.2 Constructive Interpretation**

The separated positive (must) and negative (cannot) satisfaction relations over an epistemic must/can transition structure Y ∈ YP,A(S, E, L, S0) can be merged into a single, uniform satisfaction relation relying on the *negation normal form* of epistemic formulæ where negation only occurs in front of propositions. For an arbitrary ϕ ∈ ΦP,A there exists an equivalent nnf(ϕ) ∈ ΦP,A in negation normal form, such that, in particular

$$\begin{aligned} \mathtt{nmf}(\neg p) &= \neg p & \mathtt{nmf}(\neg\neg\varphi) &= \mathtt{nmf}(\varphi) \\ \mathtt{nmf}(\neg\mathtt{false}) &= \mathtt{true} & \mathtt{nmf}(\neg(\varphi\_1 \land \varphi\_2)) &= \mathtt{nmf}(\neg\varphi\_1) \lor \mathtt{nmf}(\neg\varphi\_2) \\ & & \mathtt{nmf}(\neg\mathsf{K}\_a \,\varphi) = \mathtt{M}\_a \,\mathtt{nmf}(\neg\varphi) \end{aligned}$$

The *constructive satisfaction relation* Y, s |= ϕ for a state s ∈ Sω(Yν) and an epistemic formula ϕ ∈ ΦP,A in negation normal form is defined just as for arbitrary epistemic formulæ, but using the upper bound Y<sup>ν</sup> for the universal quantifier of K<sup>a</sup> and the lower bound Y<sup>µ</sup> for the existential quantifier of Ma; in particular,

$$\begin{aligned} &Y, s \mid \neg p \iff p \notin L(s) \\ &Y, s \mid \models \mathsf{K}\_a \varphi \iff Y, s' \mid \varphi \text{ f.a.} \ s' \in S\_\omega(Y\_\nu) \text{ with } (s, s') \in E\_a \\ &Y, s \mid \models \mathsf{M}\_a \varphi \iff \mathsf{ex.} \ s' \in S\_\omega(Y\_\mu) \text{ s.t.} \ (s, s') \in E\_a \text{ and } Y, s' \mid \varphi \end{aligned}$$

The constructive satisfaction relation indeed combines |=<sup>p</sup> and |=n:

**Lemma 3.** *Let* Y ∈ YP,A(B)*,* ϕ ∈ ΦP,A*, and* s ∈ Sω(Yν)*. Then* Y, s |=<sup>p</sup> ϕ *iff* Y, s |= nnf(ϕ) *and* Y, s |=<sup>n</sup> ϕ *iff* Y, s |= nnf(¬ϕ)*.*

It follows that if Y<sup>µ</sup> = Yν, then Y, s |= ϕ if, and only if, Yµ, s |= ϕ or, equivalently, Yν, s |= ϕ. We also obtain that constructive satisfaction is preserved when extending epistemic must/can transition structures:

**Lemma 4.** *Let* Y, Y <sup>0</sup> ∈ YP,A(B) *with* Y v Y <sup>0</sup> *and let* ϕ ∈ ΦP,A*. Then* Y, s |= nnf(ϕ) *implies* Y 0 , s |= nnf(ϕ) *for all* s ∈ Sω(Y 0 ν )*.*

This preservation of satisfaction yields that constructive interpretation is monotone.

**Proposition 3.** *Let* Γ = (B, T ) *be an epistemically guarded transition system over* Σ *and* Y, Y <sup>0</sup> ∈ YΣ(B) *such that* Y v Y 0 *. Then* Γ <sup>Y</sup> v Γ Y 0 *.*

Finally, we can observe that YΣ(B) for B = (S, E, L, S0) with the ordering v is an *inductive partial order*: each directed subset ∆ ⊆ YΣ(B) has a least upper bound F ∆ w. r.t. v, where *directed* means that every two Y1, Y<sup>2</sup> ∈ ∆ have an upper bound Y ∈ ∆ such that Y<sup>1</sup> v Y and Y<sup>2</sup> v Y ; and there is also a *bottom* or least element ⊥Σ,<sup>B</sup> = (S, E, L, S0,(∅, S × S)) ∈ YΣ(B).

**Proposition 4.** (YΣ(B), v, ⊥Σ,B) *is an inductive partial order.*

Pataraia's fixed-point theorem [9, §8.22] now guarantees that the monotone operator Y 7→ Γ Y for each epistemically guarded transition system Γ = (B, T ) has a least fixed point in the inductive partial order. It can be computed by, possibly transfinite, iterated application of constructive interpretation to ⊥Σ,B, that is, Y<sup>0</sup> = ⊥Σ,B, Yα+1 = Γ Y<sup>α</sup> for a successor ordinal α + 1, and Y<sup>λ</sup> = F α<λ <sup>Y</sup><sup>α</sup> until equality [9, Exc. 8.19]. Compared to the iteration semantics of Sect. 3.3, the computation of the constructive semantics thus does not have to record all previous approximations in order to find a repetition.

#### **4.3 (Un-)Decided Constructive Fixed Points**

If any constructive fixed point Y = Γ <sup>Y</sup> with Y ∈ YΣ(B) is decided, then there is the solution Y<sup>µ</sup> = Γ <sup>Y</sup><sup>µ</sup> = Γ <sup>Y</sup><sup>ν</sup> = Y<sup>ν</sup> in terms of epistemic transition structures, and Γ is not contradictory. Even if it is not decided, the must/can structures Yµµ = (B, (T(Yµ), T(Yµ))) ∈ YΣ(B) and Yνν = (B,(T(Yν), T(Yν))) ∈ YΣ(B) satisfy Y v Yµµ and Y v Yνν, such that by Prop. 3 we obtain Y = Γ <sup>Y</sup> v Γ <sup>Y</sup>µµ , Γ <sup>Y</sup>νν which yields Y<sup>µ</sup> ⊆ Γ <sup>Y</sup><sup>µ</sup> and Γ <sup>Y</sup><sup>ν</sup> ⊆ Yν, but not equality, in general. For the least constructive fixed point µΓ, any solution M = Γ <sup>M</sup> thus satisfies (µΓ)<sup>µ</sup> ⊆ M ⊆ (µΓ)ν, always giving sound lower and upper bounds and, if µΓ is decided, moreover unique solvability:

**Proposition 5.** *Let* Γ = (B, T ) *be an epistemically guarded transition system over* Σ *and assume* µΓ ∈ YΣ(B) *is decided. Then* Γ *has a unique solution in* MΣ(B)*.*

Still, even for epistemically guarded transition systems that provide epistemic witnesses it is not guaranteed that the least constructive fixed point is decided:

*Example 8.* Consider the following epistemically guarded transition system Γnd = (Bnd , Tnd ) over Σnd = (Pnd , And ) with Pnd = {p, q} and And = {a, b}:

$$O\_{n\mathfrak{d},\mathfrak{b}} = \{\mathfrak{q}\} \xleftarrow{\mathfrak{u}\_{\mathfrak{d}}} \xleftarrow{\mathfrak{u}\_{\mathfrak{d}}} \overbrace{\mathfrak{k}^{\mathfrak{p}} \, \overline{\mathfrak{d}^{\mathfrak{p}} \, \mathfrak{d}^{\mathfrak{p}} \, \overline{\mathfrak{d}^{\mathfrak{p}}}}^{\mathfrak{u}\_{\mathfrak{d}}} \xleftarrow{\mathfrak{k}^{\mathfrak{p}} \, \mathfrak{d}^{\mathfrak{p}} \, \overline{\mathfrak{p}}!} $$

Constructive interpretation yields the non-decided fixed point Ynd with T(Ynd,µ) = ∅ and T(Ynd,ν) = {(u0, u1)}, as Ynd , u<sup>0</sup> 6|= K<sup>b</sup> M<sup>a</sup> p, but also Ynd , u<sup>0</sup> 6|= M<sup>b</sup> K<sup>a</sup> ¬p: the states u<sup>0</sup> and u<sup>1</sup> can be distinguished by agent a, and agent b cannot tell whether a step has been taken. In u<sup>0</sup> the formula M<sup>a</sup> p holds w. r.t. Ynd , but in u<sup>1</sup> it does not, since (u1, u0) 6∈ End,a. On the other hand, Γnd provides epistemic witnesses pathologically, since Γnd <sup>M</sup>, s |= K<sup>b</sup> M<sup>a</sup> p for any M ∈ M<sup>Σ</sup>nd (Bnd ) and any s ∈ Sω(Γnd <sup>M</sup>), and hence has a unique interpretation, which in this case is Γnd <sup>Y</sup>nd,µ = Ynd,ν = Γnd Ynd,ν .

For synchronous epistemically guarded transition systems, however, the least fixed point is decided, since all knowledge refers to a past that must have happened:

**Lemma 5.** *Let* Γ = (B, T ) *be an epistemically guarded transition system over* Σ *that is synchronous. Let* Y ∈ YΣ(B) *satisfy* Γ <sup>Y</sup> = Y *. Then* Y *is decided.*

Summing up, the constructive approach to interpreting knowledge-based programs subsumes the solutions for synchronous programs and provides a sound procedure for obtaining lower and upper bounds for the execution of both synchronous and asynchronous programs. The approach, however, is not complete: If the least constructive fixed point µΓ is undecided, a system Γ may be contradictory without any solution (see Ex. 3(a)), selffulfilling with several solutions (see Ex. 4(d)), or it may have a unique solution in terms of epistemic transition structures (see Ex. 4(c)). One strategy that suggests itself for analysing Γ further is to check whether an interpretation using the lower bound (µΓ)<sup>µ</sup> of the least fixed point satisfies Γ (µΓ)<sup>µ</sup> = (µΓ)<sup>ν</sup> = Γ (µΓ)<sup>ν</sup> , which means that when executing according to what must happen all what can happen is already covered (see Ex. 8).

# **5 Knowledge-based Programs as Rule Systems**

The "executions" of an epistemically guarded transition system Γ can be captured as derivations of two mutually dependent inductive rule systems, like used for inductive definitions [1,19]. One rule system defines the reachability in Γ, the other one the satisfaction of knowledge formulæ in negation normal form over Γ. When Γ provides epistemic witnesses, the mutual dependence can be resolved by stratifying the rule system for reachability according to the depth of the execution. In the general case, the nonmonotone dependence of the formula satisfaction system on the reachability system — the more states are reachable, the less is known — can be mitigated by extending the notion of rule systems to include also negative premisses: The conclusion of a rule is derivable if all its (positive) premisses are derivable, but none of its negative premisses. When applied to knowledge formulæ, negative premisses express that no counterexample is reachable.

The general rule systems can also be read as logic programs with "negation as failure" [11]. A direct application of the must/can approximation technique to the general rule system or, equivalently, the logic program resulting from a knowledge-based program reconstructs the Kripke-Kleene fixed point; the possible solutions correspond to "stable models" [16].

# **5.1 Inductive Rule Systems**

An *inductive rule system* R consists of *rules* of the form X/y where the *premisses* X ⊆ U and the *conclusion* y ∈ U are drawn from some *universe* of *judgements* U. A rule X/y is interpreted as "if all X can be inferred, then y can be inferred". The *derivations* in R together with their *sets of premisses* and *conclusions* are inductively defined as follows:


A y ∈ U is *derivable* in R if there is a derivation in R with the empty set of premisses and conclusion y. The set of derivable conclusions of R coincides with the least fixed point µRˆ of Rˆ : ℘U → ℘U defined by Rˆ(P) = {y ∈ U | ex. X/y ∈ R s. t. X ⊆ P}.

In logic programming terms, a rule X/y ∈ R yields a Horn clause y ← X [11]. The least fixed point µRˆ coincides with minimal Herbrand model of the logic program corresponding to R and thus with the single stable model, as no negation is involved [11,16].

For expressing reachability and the satisfaction of knowledge formulæ in an epistemically guarded transition system Γ = (S, E, L, S0, T ) over (P, A) as inductive rule systems, we use two types of judgements, one of the form s ∈ <sup>Γ</sup> S<sup>ω</sup> with s ∈ S for "state s is reachable in Γ", and one of the form s |=<sup>Γ</sup> ϕ with s ∈ S and ϕ ∈ ΦP,A in negation normal form for "state s satisfies formula ϕ in Γ". The rules for reachability read:

$$\begin{array}{llll}\hline & \frac{1}{s\_0 \in \Gamma} & \text{if } s\_0 \in S\_0 & \quad \frac{s \in \Gamma}{s' \in \Gamma} & \text{if } \text{ex.} \left(\varphi \supset B\right) \in \mathcal{T},\\ \hline s\_0 \in \Gamma & \text{is } & (s, s') \in B, \text{and } s \mid = \,^\Gamma \varphi \\ \hline \end{array}$$

where s |=<sup>Γ</sup> ϕ in the side condition of the second rule requires this judgement to be derivable in the rule system for satisfaction. The rules for this system read:

$$\begin{cases} \begin{array}{ll} \frac{\Gamma}{s\ \vdash^{\Gamma}\mathsf{true}} & \text{if } s \in^{\Gamma}S\_{\omega} \\ \end{array} & \begin{array}{ll} \begin{array}{ll} s \ \mathsf{if } s \in^{\Gamma}S\_{\omega} \\ s \ \mid \ \mathsf{F} \end{array} & \begin{array}{ll} \mathsf{if } s \in^{\Gamma}S\_{\omega} \\ p \in\mathsf{L}(s) \end{array} & \begin{array}{ll} \mathsf{if } s \in^{\Gamma}S\_{\omega} \\ s \ \mid \ \mathsf{F} \end{array} & \begin{array}{ll} \mathsf{if } s \in^{\Gamma}S\_{\omega} \\ p \notin\stackrel{\Gamma}{\texttt{L}(s)} \end{array} \\ \end{array} \\ \end{cases} & \begin{array}{ll} \mathsf{if } s \in^{\Gamma}\mathsf{L}\_{1}\begin{array}{ll} \varphi\_{1} & s \ \mathsf{=}^{\Gamma}\varphi\_{2} \\ s \ \mid \ \mathsf{F} \end{array} & \begin{array}{ll} \mathsf{if } s \in^{\Gamma}S\_{\omega} \\ p \notin\stackrel{\Gamma}{\texttt{L}(s)} \end{array} \\ \end{cases} \\ \end{cases}$$

Here, the last rule for satisfaction in fact is not monotone w. r.t. reachability: In order to infer s |=<sup>Γ</sup> K<sup>a</sup> ϕ it is not necessary to infer s 0 |=<sup>Γ</sup> ϕ for all s <sup>0</sup> with (s, s<sup>0</sup> ) ∈ Ea, but only for those for which s <sup>0</sup> ∈ <sup>Γ</sup> S<sup>ω</sup> can be deduced — and also for all of those.

The notion of providing epistemic witnesses allows to stratify the inductive rule systems according to the involved depth k ≥ 0: We specialise the judgement s ∈ <sup>Γ</sup> S<sup>ω</sup> into s ∈ <sup>Γ</sup> S<sup>k</sup> meaning "state s is reachable in Γ in up to k steps" and, similarly, the judgement s |=<sup>Γ</sup> ϕ into s |=<sup>Γ</sup> <sup>k</sup> <sup>ϕ</sup> meaning "formula <sup>ϕ</sup> is satisfied in <sup>Γ</sup> at state <sup>s</sup> considering states reachable in up to k steps". The rules for reachability become for all k ≥ 0:

$$\begin{array}{ccc}\hline\hline s\_0 \in ^{\Gamma}S\_k & \text{if } s\_0 \in ^{\Gamma}S\_0 & \begin{array}{c} s \in ^{\Gamma}S\_k \\ s' \in ^{\Gamma}S\_{k+1} \end{array} & \begin{array}{c} \text{if } \text{ex.} \left(\varphi \supset B\right) \in \mathcal{T}, \\ (s, s') \in B, \text{and } s \succeq\_k^{\Gamma} \varphi \end{array} \\ \end{array}$$

Analogously the rules for satisfaction become for all k ≥ 0:

s |=<sup>Γ</sup> k true if s ∈ <sup>Γ</sup> S<sup>k</sup> s |=<sup>Γ</sup> k p if s ∈ <sup>Γ</sup> Sk, p ∈ L(s) s |=<sup>Γ</sup> <sup>k</sup> ¬p if s ∈ <sup>Γ</sup> Sk, p /∈ L(s) s |=<sup>Γ</sup> <sup>k</sup> ϕ<sup>1</sup> s |=<sup>Γ</sup> <sup>k</sup> ϕ<sup>2</sup> s |=<sup>Γ</sup> <sup>k</sup> ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> s |=<sup>Γ</sup> <sup>k</sup> ϕ<sup>1</sup> s |=<sup>Γ</sup> <sup>k</sup> ϕ<sup>1</sup> ∨ ϕ<sup>2</sup> s |=<sup>Γ</sup> <sup>k</sup> ϕ<sup>2</sup> s |=<sup>Γ</sup> <sup>k</sup> ϕ<sup>1</sup> ∨ ϕ<sup>2</sup> s 0 |=<sup>Γ</sup> <sup>k</sup> ϕ s |=<sup>Γ</sup> <sup>k</sup> M<sup>a</sup> ϕ if (s, s<sup>0</sup> ) ∈ Ea, s <sup>0</sup> ∈ <sup>Γ</sup> S<sup>k</sup> (s 0 |=<sup>Γ</sup> <sup>k</sup> ϕ)<sup>s</sup> <sup>0</sup>∈<sup>Γ</sup> Sk, (s,s0)∈E<sup>a</sup> s |=<sup>Γ</sup> <sup>k</sup> K<sup>a</sup> ϕ

In particular, the rules for s |=<sup>Γ</sup> <sup>k</sup> M<sup>a</sup> ϕ and s |=<sup>Γ</sup> <sup>k</sup> <sup>K</sup><sup>a</sup> <sup>ϕ</sup> are sound for epistemically guarded transition systems providing epistemic witnesses. The notion of "providing epistemic witnesses" requires that, if K<sup>a</sup> ϕ does not hold at depth k, there is a counterexample to ϕ at depth ≤ k. The general case can be covered by dropping the depths and taking into account that K<sup>a</sup> ϕ does not hold at some state s if, and only if, there is some reachable, a-indistinguishable state s 0 at which ϕ does not hold. Therefore, in order to derive that K<sup>a</sup> ϕ indeed holds at some reachable state s, it is necessary and sufficient to show that it is *not* possible to derive that ¬ϕ holds at some reachable, a-indistinguishable state s 0 .

# **5.2 General Rule Systems with Positive and Negative Premisses**

For expressing negative information in terms of a rule system, we complement the positive premisses of the rules by negative ones: We consider general *rule systems* R over a universe U consisting of rules of the form (X,/Z)/y where X, Z ⊆ U are the *positive* and *negative premisses*, and y ∈ U is the *conclusion*; it is interpreted as "if all X can be inferred but no Z, then y can be inferred". The *derivations* in R together with their *sets of positive and negative premisses* and *conclusions* are again inductively defined as follows:


For a B ⊆ U, let R¯(B) be all those y ∈ U such that there is a derivation of y in R with the empty set of positive premisses and no negative premisses in B. The set of *derivable conclusions* of R is given by the least fixed point of R¯ if it exists.

From the logic programming perspective, a general rule (X,/Z)/y ∈ R can be seen as a clause of the form y ← X,/Z with / read as "negation as failure" [5,11]. Checking that a B ⊆ U is a "stable model" of the logic program obtained from R in this way corresponds to the following process on general rule systems: first the reduct R<sup>B</sup> is formed by disregarding all rules (X,/Z)/y ∈ R with B∩Z 6= ∅ and transforming the remaining rules(X,/Z)/y ∈ R into X/y ∈ RB; then R<sup>B</sup> is an inductive rule system and B is stable if B = µRˆB. In particular, the stable models correspond to the *solutions* of R¯(B) = B.

With this generalised notion of rule systems we can reformulate and combine the two inference systems for reachability and satisfaction in an epistemically guarded transition system Γ = (S, E, L, S0, T ) over (P, A) by using a single judgement s |=<sup>Γ</sup> <sup>ω</sup> ϕ for "state s satisfies ϕ in Γ and state s is reachable in Γ". A negative premiss /(s |=<sup>Γ</sup> <sup>ω</sup> true) thus stands for "s ∈ <sup>Γ</sup> S<sup>ω</sup> cannot be deduced". The new rules with also negative premisses read:

s<sup>0</sup> |=<sup>Γ</sup> <sup>ω</sup> true if s<sup>0</sup> ∈ S<sup>0</sup> s |=<sup>Γ</sup> <sup>ω</sup> ϕ s 0 |=<sup>Γ</sup> <sup>ω</sup> true if ex. (ϕ ⊃ B) ∈ T , (s, s<sup>0</sup> ) ∈ B s |=<sup>Γ</sup> <sup>ω</sup> true s |=<sup>Γ</sup> <sup>ω</sup> p if p ∈ L(s) s |=<sup>Γ</sup> <sup>ω</sup> true s |=<sup>Γ</sup> <sup>ω</sup> ¬p if p /∈ L(s) s |=<sup>Γ</sup> <sup>ω</sup> ϕ<sup>1</sup> s |=<sup>Γ</sup> <sup>ω</sup> ϕ<sup>2</sup> s |=<sup>Γ</sup> <sup>ω</sup> ϕ<sup>1</sup> ∧ ϕ<sup>2</sup> s |=<sup>Γ</sup> <sup>ω</sup> ϕ<sup>1</sup> s |=<sup>Γ</sup> <sup>ω</sup> ϕ<sup>1</sup> ∨ ϕ<sup>2</sup> s |=<sup>Γ</sup> <sup>ω</sup> ϕ<sup>2</sup> s |=<sup>Γ</sup> <sup>ω</sup> ϕ<sup>1</sup> ∨ ϕ<sup>2</sup> s 0 |=<sup>Γ</sup> <sup>ω</sup> ϕ s |=<sup>Γ</sup> <sup>ω</sup> M<sup>a</sup> ϕ if (s, s<sup>0</sup> ) ∈ E<sup>a</sup> s |=<sup>Γ</sup> <sup>ω</sup> true /(s 0 |=<sup>Γ</sup> <sup>ω</sup> nnf(¬ϕ))(s,s0)∈E<sup>a</sup> s |=<sup>Γ</sup> <sup>ω</sup> K<sup>a</sup> ϕ

The rule for s |=<sup>Γ</sup> <sup>ω</sup> K<sup>a</sup> ϕ checks that s is reachable, but that no counterexample to ϕ can be reached at an a-undistinguishable state.

Using general rule systems, the solvability of an epistemically guarded transition system is shifted to computing derivable conclusions. As for knowledge-based programs, it is not obvious from just the rules of a system R whether there are solutions of R¯(B) = B at all, and whether there is a least one.

*Example 9.* (a) The general rule system

$$R\_0 = \left\{\frac{x\_1}{x\_1}, \frac{\star x\_1 \star x\_2}{x\_2}\right\} \quad \text{over} \quad \{x\_1, x\_2\}.$$

has no set of derivable conclusions, since R¯ <sup>0</sup> has no fixed point; in particular, R¯ <sup>0</sup>(∅) = {x2} and R¯ <sup>0</sup>({x1}) = ∅ = R¯ <sup>0</sup>({x2}). In terms of stable models, computing R¯ <sup>0</sup>(∅) amounts to removing the negative premisses from the rule (∅,/{x1, x2})/x2, such that the inductive rules {x1}/x<sup>1</sup> and ∅/x<sup>2</sup> remain; and computing R¯ <sup>0</sup>({xi}) leads to the single inductive rule {x1}/x<sup>1</sup> for i ∈ {1, 2}.

R<sup>0</sup> also demonstrates that the set of derivable conclusions of a general rule system R need not coincide with the least fixed point of the operator Rˆ : ℘U → ℘U when transferred from inductive rule systems by now setting Rˆ(P) = {y ∈ U | ex. (X,/Z)/y ∈ R s.t. X ⊆ P, P ∩ Z = ∅}: µRˆ <sup>0</sup> = {x1}.

On the other hand, in view of the general rule system for epistemically guarded transition systems R<sup>0</sup> can also be rephrased as a knowledge-based program with a single agent a and a single variable x ∈ {0, 1, 2}, which a cannot observe, started with x = 0:

$$\begin{array}{c} \textbf{if } \mathsf{M}\_{\mathrm{a}}\,\mathrm{x} = 1 \multimap \mathtt{x} \gets 1\\ \lceil \,\mathsf{K}\_{\mathrm{a}}(\mathrm{x} \neq 1 \land \mathrm{x} \neq 2) \multimap \mathtt{x} \gets 2 \,\mathsf{f} \textbf{i} \end{array}$$

(b) There may be several solutions of a general rule system, but no least one:

$$R\_1 = \left\{\frac{\star x\_1}{x\_3}, \frac{\star x\_3}{x\_1}\right\} \quad \text{over} \quad \{x\_1, x\_3\}.$$

has the solutions {x1} and {x3}, but ∅ is no solution. It corresponds to the "variable setting" knowledge-based program of the introduction, see Ex. 1(b):

$$\begin{array}{c} \mathtt{if} \,\mathsf{K}\_{\mathtt{a}}\,\mathsf{x} \neq 1 \rightarrow \mathtt{x} \leftarrow 3\\ \mathtt{[}\,\mathsf{K}\_{\mathtt{a}}\,\mathsf{x} \neq 3 \rightarrow \mathtt{x} \leftarrow 1 \,\mathsf{fi} \end{array}$$

(c) Combining a contradictory rule (∅,/{x1, x2})/x<sup>2</sup> with the non-determined rules of R<sup>1</sup> we obtain the rule system

$$R\_2 = \left\{\frac{\kappa x\_1}{x\_3}, \frac{\kappa x\_3}{x\_1}, \frac{\kappa x\_1 \kappa x\_2}{x\_2}\right\} \quad \text{over} \quad \{x\_1, x\_2, x\_3\}$$

which has the unique solution {x1}: if x<sup>3</sup> were inferable, i.e., x<sup>1</sup> not inferable, this would trigger the contradictory rule for x<sup>2</sup> (see Ex. 4(c)).

# **5.3 Solving General Rule Systems**

The observations and definitions for epistemic must/can transition structures and constructive interpretation, see Sect. 4.2, can now readily be transferred to a more abstract account for general rule systems. In fact, this reconstructs the "Kripke-Kleene fixpoint"

using under- and over-approximations [11], though now using an inductive partial order. We also relate the case where the constructive interpretation is not only monotone, but continuous to knowledge-based programs.

Define, for a universe U, the set ℘ <sup>±</sup>U as {(P, Q) ∈ ℘U × ℘U | P ⊆ Q} and the relation ⊆<sup>±</sup> ⊆ ℘ <sup>±</sup>U ×℘ <sup>±</sup>U as (P, Q) ⊆<sup>±</sup> (P 0 , Q<sup>0</sup> ) if, and only if, P ⊆ P 0 and Q ⊇ Q<sup>0</sup> .

**Lemma 6.** (℘ <sup>±</sup>U, ⊆<sup>±</sup>, ⊥ ± U ) *with* ⊥ ± <sup>U</sup> = (∅, U) *is an inductive partial order.*

For a general rule system R over U with positive and negative premisses define the operator Rˇ : ℘ <sup>±</sup>U → ℘ <sup>±</sup>U that describes what *must* and what *can* be derived given what is assumed to be definitely and potentially derivable:

$$\check{R}(P,Q) = (\{ y \in U \mid \text{ex.} \ (X, \star Z) / y \in R \text{ s.t.} \ X \subseteq P, \ Q \cap Z = \emptyset \},$$

$$\{ y \in U \mid \text{ex.} \ (X, \star Z) / y \in R \text{ s.t.} \ X \subseteq Q, \ P \cap Z = \emptyset \})$$

This is well-defined: if (P, Q) ∈ ℘ <sup>±</sup>U, then Rˇ(P, Q) ∈ ℘ <sup>±</sup>U, since for P ⊆ Q and each (X,/Z)/y ∈ R with X ⊆ P and Q ∩ Z = ∅ it holds that X ⊆ Q and P ∩ Z = ∅. The operator is always monotone:

**Lemma 7.** *Let* R *be a rule system over* U*. If*(P1, Q1) ⊆<sup>±</sup> (P2, Q2)*, then* Rˇ(P1, Q1) ⊆<sup>±</sup> Rˇ(P2, Q2)*.*

As for constructive interpretation, Pataraia's fixed-point theorem now guarantees that the monotone operator Rˇ on the inductive partial order (℘ <sup>±</sup>U, ⊆<sup>±</sup>, ⊥ ± U ) has a least fixed point. Again, it can be "computed" by possibly transfinite iterated application of Rˇ to ⊥ ± U . If, however, Rˇ is even continuous, then, by Kleene's fixed-point theorem, it suffices to consider all finite approximations, i.e., µRˇ = S<sup>±</sup> <sup>n</sup>∈<sup>N</sup> <sup>R</sup>ˇn(<sup>⊥</sup> ± U ); that Rˇ is *continuous* means that if ∆ ⊆ ℘ <sup>±</sup>U is directed, then S<sup>±</sup> Rˇ(∆) = Rˇ( S<sup>±</sup> ∆).

**Lemma 8.** *Let* R *be a rule system over* U *such that every rule of* R *has only finitely many positive and negative premisses. Then* Rˇ *is continuous.*

The rule system for an epistemically guarded transition system Γ = (S, E, L, S0, T ) over (P, A) always has only finitely many positive premisses; if for each s ∈ S and each a ∈ A the set {s <sup>0</sup> ∈ S | (s, s<sup>0</sup> ) ∈ Ea} is finite, then there are also only finitely many negative premisses, such that the corresponding must/can operator is continuous.

# **6 Reasoning About Knowledge-based Programs**

We have implemented the constructive interpretation of knowledge-based programs in the prototypical "Temporal Epistemic Model Interpreter and Checker" (tEmIc5). The tool first computes the least constructive fixed point of a (finite state) epistemically guarded transition system. If the least fixed point is decided, the least solution in terms of epistemic transition structures has been found; otherwise it is checked whether the reinterpretation using the lower bound of the undecided least fixed point yields a solution.

<sup>5</sup> https://bitbucket.org/knappale/temic

If either succeeds, properties of the resulting model can be checked. These properties can be expressed in CTLK, the combination of the branching "Computation Tree Logic" (CTL) and epistemic logic [21]. What is more, CTLK can also be used in tEmIc for the action guards. The constructive interpretation just evaluates each universal quantifier of a CTL formula — A for "on all paths" — over the upper bound and each existential quantifier — E for "on some path" — over the lower bound. This adds the temporal dimension to the domain of application of knowledge-based programs. For the run-based interpreted systems of Fagin et al. [13], Van der Hoek and Woolridge [20] and Su [27] provide transformations for linear-time model checking based on local propositions, though for a fixed set of runs that does not depend on the evaluation of knowledge guards. The CTLK-model checker MCMAS [21] similarly operates on a fixed, predetermined model. In dynamic epistemic logic and its model checker DEMO [31], the transition structure is given by epistemic actions.

We first recapitulate briefly CTLK and then show its constructive evaluation over epistemic must/can transition structures. We next describe tEmIc by means of the bit transmission problem and the small paradoxical exercise of the "unexpected examination"; the tEmIc distribution also contains specifications for the well-known problems "Muddy Children" [31, pp. 93ff.] and "Sum-and-Product" [31, pp. 96f.]. Finally, we proceed to an application where CTLK is also used in the action guards: the Java memory model.

# **6.1 CTLK**

The *CTLK-formulæ* over (P, A) are defined by the following grammar:

$$\varphi ::= p \mid \text{false} \mid \neg \varphi \mid \varphi\_1 \land \varphi\_2 \mid \mathsf{K}\_a \varphi \mid \mathsf{E} \mathsf{X} \varphi \mid \mathsf{E} \mathsf{G} \varphi \mid \mathsf{E} [\varphi\_1 \mathsf{U} \varphi\_2]$$

where p ∈ P and a ∈ A. The path quantifier E is interpreted as "there is a path", the temporal modality X as "in the next step", G as "always", and U as "until". We also consider the path quantifier A for "on all paths" and the modalities F for "eventually" and R for "release", such that ¬EG ¬ϕ is abbreviated by AF ϕ and ¬E[¬ϕ<sup>1</sup> U ¬ϕ2] by A[ϕ<sup>1</sup> R ϕ2]. The *satisfaction relation* M, s |= ϕ of a CTLK-formula ϕ over (P, A) at state s ∈ S of an epistemic transition structure M = (S, E, L, S0, T) over (P, A) conservatively extends the satisfaction relation of epistemic formulæ by

$$\begin{aligned} M, s &= \mathsf{EX}\varphi \iff \mathsf{ex.}\ s\_0, s\_1, \dots \in \mathcal{P}(M, s) \text{ s.t. } M, s\_1 = \varphi \\ M, s &= \mathsf{ES}\varphi \iff \mathsf{ex.}\ s\_0, s\_1, \dots \in \mathcal{P}(M, s) \text{ s.t. } M, s\_i = \varphi \text{ f.}\ a, i \in \mathbb{N} \\ M, s &= \mathsf{E}[\varphi\_1 \cup \varphi\_2] \iff \mathsf{ex.}\ s\_0, s\_1, \dots \in \mathcal{P}(M, s) \text{ and } l \in \mathbb{N} \text{ s.t.} \\ M, s\_i &= \varphi\_1 \text{ f.}\ a \le i < l \text{ and } M, s\_l = \varphi\_2 \end{aligned}$$

where P(M, s) denotes all *paths* of M, i.e., the infinite state sequences s0, s1, . . . ∈ S with s<sup>0</sup> = s and (s<sup>i</sup> , si+1) ∈ T for all i ∈ N. A CTLK-formula ϕ is *valid* in M, written M |= ϕ, if it is satisfied in all initial states, i.e., M, s<sup>0</sup> |= ϕ for all s<sup>0</sup> ∈ S0(M).

For a direct definition of the satisfaction of CTLK-formulæ with an A, the existential path quantification for E has to be replaced by universal path quantification. As for simple epistemic logic, CTLK including AX ϕ, AG ϕ etc. admits a negation normal form (see, e. g., [3, pp. 333f.]). The *constructive satisfaction relation* of a CTLK-formula in negation

normal form over an epistemic must/can transition structure Y = (S, E, L, S0, T ) over (P, A) at a state s ∈ Sω(Yν), written Y, s |= ϕ, conservatively extends the constructive satisfaction relation of epistemic formulæ and interprets E over the lower bound Y<sup>µ</sup> and A over the upper bound Y<sup>ν</sup> such that, in particular,

$$\begin{aligned} &Y, s \mid \mathsf{EF}\varphi \iff \mathsf{ex.}\, s\_0, s\_1, \dots \in \mathcal{P}(Y\_\mu, s) \text{ and } i \in \mathbb{N} \,\text{s.t.}\, Y, s\_i \mid = \varphi\\ &Y, s \mid = \mathsf{AF}\varphi \iff \text{f.a.}\, s\_0, s\_1, \dots \in \mathcal{P}(Y\_\nu, s) \text{ ex.}\, i \in \mathbb{N} \,\text{s.t.}\, Y, s\_i \mid = \varphi \end{aligned}$$

#### **6.2 tEmIc**

tEmIc is a symbolic model interpreter and checker for epistemically guarded transition systems using CTLK. It is written in Java and uses binary decision diagrams for state space representation [28]; it also supports bounded integers and their arithmetic. Given a specification, tEmIc first computes the least constructive fixed point by iterated must/can interpretation. If this fixed point is not decided it checks whether another interpretation using the lower bound of the fixed point yields a solution. If either succeeds, tEmIc proceeds with model checking given properties; these statements can be specified as CTLK-formulæ which have to hold in all initial states or as a reachability query. Reachable deadlock states without outgoing transitions result in a warning.

For example, the bit transmission problem of the introduction as formalised in Ex. 1(a) can be represented as a tEmIc specification as follows (rules are introduced by keyword action followed by a name of the rule and the rule definition):

```
var sbit, ack, rbit, snt : boolean initial (ack | rbit | snt) <-> false;
agent S = { sbit, ack }; agent R = { rbit, snt };
let R_knows_bit = exists bit:boolean . K[R] sbit <-> bit;
action S_sends_bit_ok
guard not K[S] R_knows_bit do rbit := sbit, snt := true;
action S_sends_bit_failed
guard not K[S] R_knows_bit do ;
action R_sends_ack_ok
guard R_knows_bit and not K[R] K[S] R_knows_bit do ack := true;
action R_sends_ack_failed
guard R_knows_bit and not K[R] K[S] R_knows_bit do ;
```
Constructive interpretation yields in a few milliseconds the decided least fixed point of Ex. 2, over which some CTLK-properties can be checked:


The first two are reported to hold, but the last does not since agent R cannot gather enough information to be sure that the bit has been received by agent S.

For another example, consider the "unexpected examination" paradox [10, Sect. 4.7, there called "unexpected hanging"] (for a detailed account see, e. g., [26, Sects. 5.2f.]): A class is told that within the next week there will be an exam, but it will be a surprise. The class might reason that the exam cannot happen on Friday, because if there has been no exam up to Thursday it will not be a surprise on Friday any more; by backward induction it might reason that there cannot be a surprise exam in the next week at all. This problem statement can be readily expressed as a tEmIc specification:

```
var day : 0..5 initial day = 0;
var exam : 0..4;
var written : boolean initial written <-> false;
agent P = { day, written };
action act1
guard day < 5 and (day = exam) and (not K[P] day = exam) and not written
do written := true, day := day+1;
action act2
guard day < 5 and (day != exam) do day := day+1;
action stutter
do ;
```
Again, constructive interpretation yields in a few milliseconds a decided least fixed point. Over this epistemic transition structure we can check that on, e. g., Wednesday the exam can be written and still is indeed a surprise:

```
check reachable exam = 2 & written;
```
For such a reachability check tEmIc also provides a witness that tells that act2 is executed twice after which act1 follows. The following CTLK-property, however, is not satisfied, as it would have to hold in all initial states — and with exam being 4 the class cannot be surprised any more:

check initial EF written;

#### **6.3 Memory Models**

Memory models regulate the interaction between threads, their caches, and the main memory [23]. The original Java memory model — one of the first formal such models has been harshly criticised for making several compiler optimisations impossible and has subsequently been superseded by a more liberal model [17, Ch. 17]. Keeping strong guarantees for sequentially consistent, well-synchronised programs, reorderings of dataindependent statements or early, "prescient" reads from other threads are allowed for programs with data races. Still, some limits, like consistency with data or control flow dependencies or no "out-of-thin-air" values, should be in force [25,2].

For example, in the following two-threaded Java-like program to the left it should be possible that both thread-local registers r1 and r2 are assigned the value 1 when reading the global, shared variables x and y: A compiler could reorder the data-independent statements in both threads. This behaviour, however, should be forbidden in the example to the right, since there is a symmetric data dependence.


We want to capture the behaviour of a multi-threaded (Java) program with a liberal memory model without having to check all possible compiler transformations — the correctness of such transformations would actually depend on the program semantics including the memory model. In fact, in the current Java memory model out-of-order executions have to be justified by other legal executions. We interpret these justifications as witnesses in terms of knowledge-based programs; our current exposition, however, neglects synchronisation. We first represent the state space of a two-threaded (Java) program like the ones above by the following tEmIc declarations:

```
var x, y, r1, r2 : 0..2 initial x = 0 & y = 0 & r1 = 0 & r2 = 0;
var step1, step2 : 1..3 initial step1 = 1 & step2 = 1;
agent t1 = { step1, r1 }; agent t2 = { step2, r2 };
```
The thread agents t1 and t2 can only observe their local registers and their program counters. The program steps for both threads are turned into actions like

```
action t1_1 guard step1 = 1 do r1 := x, step1 := step1+1;
action t1_2 guard step1 = 2 do y := 1, step1 := step1+1;
```
Additionally, we allow for a "prescient reading" of the value v from the main memory variable x by thread θ into the local variable r at step s by the following action:

```
action readθ_x_v_r_s
guard stepθ = s and K[θ] (EF (r = 0 & x = v) and EF (r = v & x = v))
do r := v, stepθ := stepθ+1;
```
The thread θ can read v from x into r early on if it *knows* that *there is an execution* where x has value v without dependence on already setting r to v, and, furthermore, that *there is an execution* where the early setting is confirmed. The statement r1 = x; of the first thread is expanded into three read actions read1\_x\_0\_r1\_1, read1\_x\_1\_r1\_1, and read1\_x\_2\_r1\_1 plus the plain reading action t1\_1. With this encoding, tEmIc reports that for the first example to the left it is indeed possible to obtain r1 = r2 = 1 in the least constructive fixed point, but that this is impossible for the example to the right.

A more intriguing case is presented by the following two examples: According to Manson et al. [23, pp. 35f.] (cf. also [2]), the program to the left can result in r1 = r2 = r3 = 1:


A compiler could see that only 0 and 1 are possible for x and y and "can then replace r2 = x by r2 = 1, because either 1 was read from x on line 1 and there is no intervening write, or 0 was read from x on line 1, 1 was assigned to x on line 3, and there was no intervening write"; this definite assignment can be used to transform the last line to y = 1; which finally can be made the first action of the first thread, as there are no dependencies. But the same transformation is not possible for the program to the right, and there the same behaviour should be disallowed. Still, the left program is the result of inlining the second thread into the first. Our encoding of the two programs in tEmIc

confirms these considerations and the witness for the left program indeed first sets r3 to 1 and confirms this only in the last step setting y to 1.

# **7 Conclusions and Future Work**

We have introduced a must/can analysis for the interpretation of knowledge-based programs inspired by the constructive semantics of synchronous programming languages. The resulting constructive interpretation provides lower and upper bounds for the possible executions. This interpretation has been shown to be monotone and to yield a least fixed point. We have also transformed knowledge-based programs to general rule systems with positive and negative premisses. Finally, we have described our tool tEmIc for constructive interpretation and temporal-epistemic model checking over CTLK and demonstrated some applications of interpreting knowledge-based programs including CTLK-guards.

Our epistemic logic could be complemented by group knowledge [14, Ch. 6], like common or distributed knowledge. The temporal dimension could be extended to "Linear-Time Logic" (LTL), and, more importantly, to include some notion of fairness. Criteria for ensuring decided least fixed points for the must/can interpretation beyond synchronicity would be desirable. Also a comparison with non-monotone inductive definitions [12], SOS rules with negative premisses [24], and solution strategies for epistemic specifications [5], would be of interest. On the other hand, the general constructive approach may be useful to complement existing intuitionistic approaches to the semantics of synchronous programming languages [22]. Finally, the domain of memory models should be covered more comprehensively by interpreting knowledge-based programs.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Contextual Modal Type Theory with Polymorphic Contexts

Yuito Murase1() , Yuichi Nishiwaki<sup>2</sup> , and Atsushi Igarashi<sup>1</sup>

<sup>1</sup> Kyoto University, Kyoto, Japan {murase@fos.kuis.kyoto-u.ac.jp, igarashi@kuis.kyoto-u.ac.jp} <sup>2</sup> Tokyo, Japan yuichi.nishiwaki@icloud.com

Abstract. Modal types—types that are derived from proof systems of modal logic—have been studied as theoretical foundations of metaprogramming, where program code is manipulated as first-class values. In modal type systems, modality corresponds to a type constructor for code types and controls free variables and their types in code values. Nanevski et al. have proposed contextual modal type theory, which has modal types with fine-grained information on free variables: modal types are explicitly indexed by contexts—the types of all free variables in code values.

This paper presents λ<sup>∀</sup>[], a novel extension of contextual modal type theory with parametric polymorphism over contexts. Such an extension has been studied in the literature but, unlike earlier proposals, λ<sup>∀</sup>[] is more general in that it allows multiple occurrence of context variables in a single context. We formalize λ<sup>∀</sup>[] with its type system and operational semantics given by β-reduction and prove its basic properties including subject reduction, strong normalization, and confluence. Moreover, to demonstrate the expressive power of polymorphic contexts, we show a type-preserving embedding from a two-level fragment of Davies' λ , which is based on linear-time temporal logic, to λ<sup>∀</sup>[].

Keywords: Contextual modal types, Fitch-style modal lambda-calculi, Metaprogramming, Polymorphic contexts

# 1 Introduction

It is a common technique in metaprogramming to use code as a first-class value to generate, combine, and evaluate code at compile- and run-time. Type systems for first-class code are known to correspond to proof systems of modal logic under the Curry–Howard isomorphism [5,19,6,30,17]: Modality corresponds to a type constructor for code types, controlling free variables and their types in code values. Such modal type systems have been proposed for various areas of metaprogramming, including multi-stage computation [29,2,13], syntactic metaprogramming [7,27], and, more recently, applied to proof assistants [3,21,26].

Modal types come in two flavors: implicit and explicit contexts. On the one hand, modal types with implicit contexts do not show typing contexts—free variables and their types—of code values. A classical example of a modal type system with implicit contexts is λ [5], in which a code type is expressed by T ("code of T"), no matter what variables are referenced in the code. It has been applied to real programming languages for multi-stage programming, such as MetaOCaml [2,13]. Since the type operator is derived from the modality "next" in linear-time temporal logic, we call these code types linear-time temporal types. On the other hand, modal types with explicit contexts show typing contexts in code types. For example, the type of code x+2 is expressed by [x : int]int, which stands for code of an integer expression that includes free occurrences of an integer variable x. Such types are often called contextual modal types [17]. Prior work points out that contextual modal types have advantages over linear-time temporal types in dealing with mutable reference cells and run-time code evaluation [12,24,14] although it is not actively applied to real multi-stage programming languages so far. Contextual modal types is rather known for its applications to proof assistants [20,3,21,26], where users can operate on code representation of proof terms with explicit contexts.

Some previous work [12,16,3,21,23] on contextual modal types has suggested polymorphic contexts—polymorphism over typing contexts in contextual modal types—to abstract part of typing contexts by context variables γ: For example, a type ∀γ.[γ]T<sup>1</sup> → [γ]T<sup>2</sup> denotes functions that take code of type T<sup>1</sup> under an arbitrary typing context γ and return code of type T<sup>2</sup> under the same typing context γ. Although we can see that polymorphic contexts will play an important role in metaprogramming with contextual modal types, its type-theoretic foundations are not fully investigated yet.

Our contributions. This paper proposes a novel contextual modal type theory λ<sup>∀</sup>[] that provides a type-theoretic foundation for polymorphic contexts. Our technical contributions are summarized below:


Organization of the paper. Section 2 provides motivating examples from metaprogramming. Our formal development starts with a simple Fitch-style modal type theory λ[] in Section 3. We extend λ[] to λ<sup>∀</sup>[] with polymorphic contexts and

prove subject reduction in Section 4; we prove strong normalization of λ∀[] in Section 5. Section 6 develops a sound embedding from linear-time temporal types to contextual modal types. Finally, we discuss related work in Section 7 and give a conclusion in Section 8.

# 2 Motivation

This section provides examples from common metaprogramming use cases. We use a hypothetical OCaml-like language with contextual modal types we present later. Note that the language is supposed only to illustrate the type theory's informal ideas and is not intended as practical language.

#### 2.1 Simple Contextual Modal Types: Specializing Power Function

First, we show a typical example from staged computation, the power function, to demonstrate how we can use contextual modal types for staged computation.

```
(* val pow : int -> [int |- int] *)
let rec pow n = match n with
  | 0 -> '<x: int> 1
  | n -> let u = pow (n-1) in '<x: int>(x * ,1(u)[x])
(* val power4 : int -> int *)
let power4 = ,0('<>(fun x:int -> ,1(pow 4)[x]))[]
```
The function pow generates a piece of code: x \* (. . . \* (x \* 1). . . ) that multiplies variable x n times; the function power4 puts the code generated by pow under function abstraction and evaluates the code at run-time to obtain a function value to compute x <sup>4</sup> without recursion.

This example uses two constructs for code manipulation: quote of the form '<Γ>M and unquote of the form ,n(M)[M1, . . . , Mk]. The former, which is similar to quasi-quotation in Lisp, generates code of an expression M paired with a variable environment Γ under which the code is evaluated. In the example, the quote '<x: int> 1 is code of constant 1 with the environment with single integer variable x. The quote has a contextual modal type [int |- int], where the premise (int on the left of |-) corresponds to the environment x:int and the succedent (int on the right) to the code body.

Given a contextual modal type [C ` T], we call C a context. A context is a sequence of types and does not involve variables. Similarly to de Bruijn indices, we identify variables in a context by their position rather than by their names. For instance, two quotes, '<x:int, y:int>x and '<z:int, w:int>z, are considered α-equivalent because both use the first variable in the environment even though the variable names in the two environments are different. Both terms have the same type [int, int |- int].

An unquote ,n(M)[M1, . . . , Mk] is used to expand a code value M. For example, ,1(u)[x] expands u of type [int |- int]. In addition to the code to be expanded, an unquote involves two annotations, an explicit substitution [M1, . . . , Mk] and a stage transition n. An explicit substitution provides the definitions of the variables in the environment of a quote value. In the example code, ,1(u)[x] supplies an explicit substitution [x] as the definition for a single-variable context int. If u is '<y:int>y \* 1, then the unquote will expand to x \* 1, replacing y with its definition x. Roughly speaking, a stage transition represents the number of nested quotes surrounding M. The expression ,1(u)[x] applies the explicit substitution to u, and splices the obtained code into the surrounding quote. Thus, the code '<x: int>(x \* ,1(u)[x]) adds "x \*" to the code denoted by u. On the contrary, the unquote ,0('<>(fun x:int -> ,1(pow 4)[x]))[] computes '<>(fun x:int -> ,1(pow 4)[x]) (to obtain the code value fun x:int -> (x \* (x \* (x \* (x \* 1)))) with the empty environment) and expands it; since there is no surrounding quote, the expansion amounts to running the code. In this sense, the unquote in this language can be considered as unquote in Lisp-like languages if the stage transition is 1 and as eval function if it is 0.

# 2.2 Polymorphic Contexts: Macro repeat

Secondly, consider a macro called repeat, which repeats a given piece of code n times. For example, we expect Lisp code (repeat 2 (print "hello")) to show hello two times. We can imitate such a macro as follows:

```
(* val repeat : int -> [string -> unit |- unit]
                    -> [string -> unit |- unit] *)
let rec repeat n body = match n with
  | 0 -> '<pr: string -> unit>(())
  | n -> let u = repeat (n-1) body in
         '<pr: string -> unit>(,1(u)[pr]; ,1(body)[pr])
```
This function repeat takes an integer n for the number of repetitions and code to be repeated. For example, a macro call in Lisp (repeat 2 (print "hello")) can be represented below.

```
,1(repeat 2 '<pr:string -> unit>(pr "hello"))[print]
```
To model macro expansion, we assume the whole code with macro calls is surrounded by a quote; hence, we use the stage transition 1, instead of 0, to splice the result of the macro call of repeat. Note that the environment pr:string -> unit is expected to be the function print. After applying the function repeat, we obtain the following code.

```
,1('<pr:string -> unit>(pr "hello"; pr "hello"; ()))[print]
```
Finally, by evaluating unquote, the code is fully expanded (with substituting library the function print for pr) to

```
print "hello"; print "hello"; () .
```
A problem with the function repeat is that it accepts code values with an environment that consists of a single variable of type string -> unit. We rather expect the function to accept code values with various patterns of contexts and to have multiple types that differ only in contexts: e.g.,

```
– int -> [string -> unit |- unit] -> [string -> unit |- unit],
– int -> [string -> unit, int, int |- unit]
  -> [string -> unit, int, int |- unit], and
– int -> [unit -> unit |- unit] -> [unit -> unit |- unit] .
```
We will resolve this issue by abstracting the context part of the function with a context variable G. As a result, we obtain the type for generic repeat: forall G. int -> [G |- int] -> [G |- int]. We call the type starting with forall G. a polymorphic context type, which means that we can instantiate the context variable G with any context. We can implement this generic function poly\_rep by using a context variable as follows.

```
(* val poly_rep : forall G. int -> [G |- unit] -> [G |- unit] *)
let rec poly_rep [G] n body =
  match n with
  | 0 -> '<xs: G>(())
  | n -> let u = poly_rep [G] (n-1) body in
         '<xs: G>(,1(u)[xs]; ,1(body)[xs])
```
This function takes an additional context argument G, which is used in quotes. xs is a series variable, which is a novel sort of variables in this paper. A series variable stands for a sequence of (ordinary) variables—corresponding to the fact that a context variable stands for a sequence of types—and forms an environment by pairing with a context variable. For example, xs:G will represent environment x:int, y:string if we substitute x, y for xs, and int, string for G. We can also use series variables for explicit substitution. If we use a series variable in an explicit substitution, as in ,1(u)[xs], xs stands for an explicit substitution consists of a series of variables. For instance, if xs:G expands to x:int, y:string, then ,1(u)[xs] also expands to ,1(u)[x,y]. In this case, series variables work like identity substitutions in prior work [26,3,21,23], which pass variables from an environment to explicit substitutions as-is.

Using poly\_rep, we can repeat code with two variables as follows:

```
poly_rep [unit->int, int->unit] 3
  ('<rand:unit->int, printInt:int->unit>(printInt(rand())))
```
We apply to the context unit->int, int->unit in order to instantiate the context variable G. It is worth noting that the series variables accompanied by G will also be replaced automatically with fresh variables. In this case, the quote '<xs: G>(,u[xs]; ,body[xs]) will turn into

```
'<x: unit->int, y:int->unit>(,u[x,y]; ,body[x,y])
```
where the series variable xs is replaced with fresh variables x,y. This way, a mapping between variables and types is well maintained.

# 2.3 More Polymorphic Contexts: Combining Different Environments

Sometimes, we might want to use pieces of code with different environments. Consider a function generic\_plus, which takes two pieces of code as arguments and returns a piece of code that sums the values of the two arguments. We can implement such a function with ease.

```
(* val generic_plus:
        forall G H. [G |- int] -> [H |- int] -> [G, H |- int] *)
let generic_plus [G H] x y = '<xs:G, ys:H>(,1(x)[xs] + ,1(y)[ys])
```
It takes two context variables G and H and puts them together in the same context. As a result, we can use variables from both contexts. Although this example is very simple, it demonstrates the novel feature of our contextual modal type theory: it permits multiple occurrences of context variables in the same context, as in [G, H |- int]. As far as we understand, previous work that supports context polymorphism only allows a single occurrence of context variables. We discuss the detail in Section 7.

One may wonder whether multiple occurrences of context variables are useful. As we answer in Section 6, this novel feature is crucial to achieve the expressibility of the multi-stage programming languages in the literature.

# 3 Simple Fitch-Style Contextual Modal Type Theory

As an introduction to contextual modal types, this section formulates simple contextual type theory λ[] without polymorphic contexts. Nanevski et al. [17] formulated their original contextual modal type theory in dual-context style [19,6,11], which has judgments with two-level contexts. In contrast, we formulate λ[] in so-called Fitch- or Kripke-style [4,1,15,6,31]. We choose this design because the Fitch-style formulation provides Lisp-like quote/unquote syntax, which is akin to that in linear-temporal type theories [5,30], and hence it is easier to compare these two type theories. We demonstrate a formal comparison in Section 6.

We obtain λ[] by extending S4 Fitch-style modal calculus with contextual modal type theory. One can consider it a combination of the Fitch-style modal calculi by Valliappan et al. [31], and the contextual extension by Nanevski et al. [17]. At the same time, we tweak definitions for an extension to polymorphic contexts in Section 4.

#### 3.1 Syntax and Type System

Types and terms in λ[] are shown in Fig. 1. Types consist of base types ranged over by ι, function types S → T, and contextual modal types [C ` T]. A contextual modal type [C ` T] generalizes an S4 modal type T by adding a context C , which is a finite sequence of types. It describes code of type T with free variables whose types are C . Note that a contextual modal type with a empty context [• ` T] has the same meaning as T, which denotes closed code

```
Types S, T ::= ι | S → T | [C ` T]
Contexts C, D ::= • | C, T
Stage transitions k ∈ N
Terms M , N ::= x | λx
                             T
                               .M | MN | quohΓˆiM | unqkM [θ]
Explicit Subst. θ ::= • | θ, M
Named Contexts Γ, ∆ ::= • | Γ, x : T | Γ, µ
                      (Γˆ and ∆ˆ denote named contexts with no µ.)
```
Fig. 1. Syntax of λ[]

of type T. In addition to standard terms of simply typed lambda calculus, λ[] has two forms, quote quohΓˆi<sup>M</sup> and unquote unqk<sup>M</sup> [θ]. We define stage transitions as natural numbers, and explicit substitutions as sequences of terms.

We often use the word named contexts for typing contexts with variables and use "contexts" for type-only ones. Similarly to other Fitch-style formulations, λ[] extends named contexts with a special symbol µ (called lock) that delimits levels of variables. For example, in a named context x : T1, µ, y : T2, z : T3, the variable x has one higher level than y and z (we will revisit the notion of levels in the definition of free variables). A named context is well formed iff the variables in it do not have duplication; we assume that all named contexts are well formed. We also require a named context in a quote to be single-level, i.e., not to contain µ. We write Γˆ for such µ-free named contexts. rg(Γˆ) denotes the range of Γˆ, a context obtained by forgetting variables in Γˆ, and dom (Γ) denotes the domain of Γ, the set of variables in Γ (locks can appear in Γ, unlike rg). We also define the weakening relation Γ<sup>1</sup> ≤ Γ<sup>2</sup> as follows.

$$\begin{array}{ccccc} \hline & \Gamma\_1 \leq \Gamma\_2 & \Gamma\_1 \leq \Gamma\_2 \\ \hline \bullet \leq \bullet & \Gamma\_1, x:\ T \leq \Gamma\_2, x:\ T & \begin{array}{c} \Gamma\_1 \leq \Gamma\_2 \\ \Gamma\_1 \leq \Gamma\_2, x:\ T \end{array} & \begin{array}{c} \Gamma\_1 \leq \Gamma\_2 \\ \hline \Gamma\_1, \blacksquare \leq \Gamma\_2, \blacksquare \end{array} \\ \hline \end{array}$$

As is common in other Fitch-style formulations, λ<sup>∀</sup>[] has a somewhat complex binding structure. We show the definition of free variables in Fig. 2. For a term M and integer k, FV<sup>k</sup> (M ) is a set of free variables in M at level k, which roughly stands for the number of quotes surrounding M . Since an unquote unq<sup>k</sup>1M [θ] cancels k<sup>1</sup> surrounding quotes, the level is lowered by k1. λ[] has two binding forms: A lambda abstraction λx <sup>T</sup> .M binds all level-0 free occurrences of x in M and a quote quohΓˆiM binds all level-0 free variables from Γˆ in M . According to these binding forms, we define α-equivalence (but omit its definition). For example, λx [T1`T2] .quohx : T1i(unq<sup>1</sup> (x )[x ]) is α-equivalent to λy [T1`T2] .quohz : T3i(unq<sup>1</sup> (z)[y]). As we shall see later, the typing rules of λ[] enforce well-typed terms to be closed with regard to negative-level free variables. Thus, we only care about positive-level free variables in this paper and assume that the meta variable k ranges over natural numbers.

Typing rules are given in Fig. 3. The judgment k : Γ C ∆ states that there is a stage transition k between two named contexts Γ and ∆. The rules mean that k is the number of locks between Γ and ∆, e.g., 0: x : T C x : T and 2: y : T<sup>1</sup> C y : T1, µ, µ, z : T2. The judgments Γ ` M : T and Γ ` θ : C state

# FV<sup>k</sup> (M ) FV<sup>k</sup> (θ)

$$\begin{aligned} \mathsf{FV}\_{k}(x) &= \begin{cases} \{x\} & \text{if } k=0\\ \emptyset & \text{otherwise} \end{cases} \\ \mathsf{FV}\_{k}(\lambda x^{T}.M) &= \begin{cases} \mathsf{FV}\_{k}(M) - \{x\} & \text{if } k=0\\ \mathsf{FV}\_{k}(M) & \text{otherwise} \end{cases} \\ \mathsf{FV}\_{k}(M\,N) &= \mathsf{FV}\_{k}(M) \cup \mathsf{FV}\_{k}(N) \\ \mathsf{FV}\_{k}(\mathsf{q}\mathsf{u}\diamondsuit\hat{I}\,)M &= \begin{cases} \mathsf{FV}\_{0}(M) - \mathsf{dom}\,(\hat{I}) & \text{if } k=-1\\ \mathsf{FV}\_{k+1}(M) & \text{otherwise} \end{cases} \\ \mathsf{FV}\_{k\_{2}}(\mathsf{un}\mathsf{q}\_{k\_{1}}M[\theta]) &= \mathsf{FV}\_{k\_{2}-k\_{1}}(M) \cup \mathsf{FV}\_{k\_{2}}(\theta) \\ \mathsf{FV}\_{k}(\star) &= \emptyset & \mathsf{FV}\_{k}(\theta,M) = \mathsf{FV}\_{k}(\theta) \cup \mathsf{FV}\_{k}(M) \end{aligned}$$

Fig. 2. Free variables. This definition assumes k is an integer, but typing rules enforces that FV<sup>k</sup> (M ) = ∅ if k < 0.

that term M has type T and explicit substitution θ has context C under named context Γ, respectively. The rules for variable x , lambda abstraction λx <sup>T</sup> .M , and application M<sup>1</sup> M<sup>2</sup> are almost the same as those in simply typed lambda calculus, except that we only care about variables from tail(Γ), the level-0 part of Γ. The type of a quote quohΓˆiM is derived by popping all level-0 variables in the named context (Recall lock does not appear in Γˆ). Thus, Γˆ binds all level-0 free variables in M . An unquote unqkM [θ] uses θ as a substitution for the context C , and k as the stage transitions between M and θ. We call a judgment derivable when it is derived from these typing rules. We assume that judgments in this paper are derivable if not stated explicitly.

#### 3.2 Substitution

We define substitution on terms and explicit substitutions. We follow the style of Valliappan et al. [31], which proposes simultaneous substitution on all free variables with any level. We provide definitions related to substitutions in Fig. 4.

A substitution typing judgment ` σ : ∆ ⇒ Γ denotes that we can replace a named context ∆ with another Γ by applying a substitution σ, e.g., ` (z := x y): (z : T2) ⇒ (x : T<sup>1</sup> → T2, y : T1). A lock substitution µ<sup>k</sup> has two roles. First, they provide information on the level of free variables to be substituted. For example, if σ = σ1, µ<sup>k</sup> , σ<sup>2</sup> where σ<sup>2</sup> does not have lock substitutions, σ<sup>2</sup> substitutes level-0 free variables, and σ<sup>1</sup> substitutes higher-level free variables. Second, they replace the lock themselves. If σ has a lock substitution µ<sup>k</sup> , it means that it replaces a lock in ∆ with k locks in Γ.

k : Γ C ∆ 0: Γ C Γ k : Γ C ∆ k : Γ C ∆, x : T k : Γ C ∆ k + 1: Γ C ∆, µ Γ ` M : T Γ ` θ : C x : T ∈ tail(Γ) Γ ` x : T Γ, x : T<sup>1</sup> ` M : T<sup>2</sup> Γ ` λx T1 .M : T<sup>1</sup> → T<sup>2</sup> Γ ` M<sup>1</sup> : T<sup>1</sup> → T<sup>2</sup> Γ ` M<sup>2</sup> : T<sup>1</sup> Γ ` M<sup>1</sup> M<sup>2</sup> : T<sup>2</sup> Γ, µ, ∆ ` M : T Γ ` quoh∆ˆiM : [rg(∆ˆ) ` T] Γ ` M : [C ` T] ∆ ` θ : C k : Γ C ∆ ∆ ` unqkM [θ]: T Γ ` •: • Γ ` θ : C Γ ` M : T Γ ` θ, M : C, T

Auxiliary function

tail(•) = • tail(Γ, µ) = • tail(Γ, x : T) = tail(Γ), x : T

Fig. 3. Typing rules of λ[]

Substitution application on terms M [σ] and explicit substitutions θ[σ] performs actual substitution operations. They are defined to satisfy the following lemma, which is expected by the intuition of substitution typing.

Lemma 1 (Substitution Lemma). If Γ ` M : T and ` σ : Γ ⇒ ∆, then ∆ ` M [σ]: T.

For example, let us consider Γ ` (unq<sup>1</sup> (x )[y]) y : T, where Γ = x : [S ` S → T], µ, y : S. We can construct the following substitution that provides a term for each variable in Γ.

$$\vdash (x := x', \clubsuit\_0, y := z \, w) \colon \boldsymbol{\Gamma} \Rightarrow (x' : [S \vdash S \to T], z \colon S \to S, w \colon S)$$

This substitution replaces level-0 occurrences of y to z w and level-1 occurrences of x to x 0 . µ<sup>0</sup> in the substitution denotes that level-1 free variables of target terms are mapped to level-0 terms; that is why the level-0 term x 0 is supplied for the level-1 variable x . We can observe that the substitution is applied as follows.

$$((\mathfrak{u}\mathfrak{n}\_1(x)[y])\,y)[x:=x',\mathsf{f}\_0,y:=z\,w] \tag{1}$$

$$= (\mathfrak{u}\mathfrak{m}\_1(x)[y])[x:=x', \mathfrak{h}\_0, y:=z\,w] \,(y[x:=x', \mathfrak{h}\_0, y:=z\,w])\tag{2}$$

$$\mathbf{x} = (\mathsf{u}\mathsf{nq}\_0(x[x:=x']))[y[x:=x',\mathsf{\Theta}\_0, y:=z\,w]])\,\big(y[x:=x',\mathsf{\Theta}\_0, y:=z\,w]\big)\quad(3)$$

$$= \left(\mathsf{u}\mathsf{nq}\_0(x')[z\,w]\right)(z\,w) \tag{4}$$

The most interesting equation is the one from (2) to (3). The substitution for x is shifted by 1 level, and the stage transition of the unquote changes from 1 to 0 to align staging levels. The resulting term is given type T under the new named context, as the substitution lemma states.

Substitution σ ::= • | σ, x := M | σ, µ<sup>k</sup> ` σ : ∆ ⇒ Γ ` •: • ⇒ Γ ` σ : ∆ ⇒ Γ<sup>1</sup> k : Γ<sup>1</sup> C Γ<sup>2</sup> ` (σ, µ<sup>k</sup> ): (∆, µ) ⇒ Γ<sup>2</sup> ` σ : ∆ ⇒ Γ Γ ` M : T ` (σ, x := M ): (∆, x : T) ⇒ Γ M [σ] θ[σ] <sup>x</sup> [σ] = ( M if x := M ∈ tail(σ) x otherwise (λx T .M )[σ] = λx T .(M [σ]) where x 6∈ dom (tail(σ)) and x 6∈ FV0(σ) (M N )[σ] = (M [σ]) (N [σ]) (quohΓˆi<sup>M</sup> )[σ] = quohΓˆi(<sup>M</sup> [σ, <sup>µ</sup>1, idΓ<sup>ˆ</sup>]) (unqkM [θ])[σ] = unq(count(k,σ))(M [σ ↑ k])[θ[σ]] •[σ] = • (θ, M )[σ] = θ[σ], M [σ]

#### Auxiliary functions

FV<sup>k</sup> (σ, x := M ) = FV<sup>k</sup> (σ) ∪ FV<sup>k</sup> (M ) FV<sup>k</sup><sup>2</sup> (σ, µ<sup>k</sup><sup>1</sup> ) = ( FV<sup>k</sup>2−k<sup>1</sup> (σ) if k<sup>2</sup> ≥ k<sup>1</sup> ∅ otherwise

$$\begin{aligned} \mathsf{tail}(\sigma, x := M) &= \mathsf{tail}(\sigma), x := M & \qquad \mathsf{id}\_{\bullet} = \bullet\\ \mathsf{tail}(\sigma, \clubsuit\_k) &= \bullet & \qquad \mathsf{id}\_{\Gamma, x \cdot \top} \, \equiv \, \mathsf{id}\_{\Gamma, x}, x := x \\ \mathsf{count}(0, \sigma) &= 0 & \qquad \mathsf{id}\_{\Gamma \bullet} = \, \mathsf{id}\_{\Gamma}, \mathsf{\mathsf{h}}\_{1} \\ \mathsf{count}((k\_1 + 1), \bullet) &= k\_1 + 1 & \qquad \sigma \uparrow 0 = \sigma\\ \mathsf{count}((k + 1), (\sigma, x := M)) &= \mathsf{count}(k + 1, \sigma) & \qquad \bullet \uparrow (k + 1) = \bullet\\ \mathsf{count}((k\_1 + 1), (\sigma, \mathsf{\mathsf{h}}\_{\mathsf{h}\_2})) &= \mathsf{count}(k\_1, \sigma) + k\_2 & (\sigma, x := M) \uparrow (k + 1) &= \sigma \uparrow (k + 1) \\ & (\sigma, \mathsf{\mathsf{h}}\_{\mathsf{h}\_1}) \, \uparrow (k\_2 + 1) &= \sigma \,\, \mathsf{t} \, k\_2 \end{aligned}$$

Fig. 4. Substitution

In Fig. 4, we also define identity substitutions that satisfies ` id<sup>Γ</sup> : Γ ⇒ Γ for any Γ. We can confirm that id<sup>Γ</sup> does not affect the result of substitution, as stated in the following lemma. We use this property to define reduction later.

Lemma 2. M [σ] = M [id<sup>Γ</sup> , σ] for any Γ.

#### 3.3 Local Soundness/Completeness and Reduction

According to Pfenning and Davies [19], the introduction and elimination rules for a type constructor should satisfy local soundness and local completeness, which correspond to β-reduction and η-expansion, respectively. We confirm that contextual modal types meet those conditions and then define reduction rules.

Local soundness states that the elimination rule is not too strong. For the case of contextual modal types, we can witness it by the following local reduction where we obtain the derivation D<sup>0</sup> by application of the substitution [id<sup>Γ</sup> , µ<sup>k</sup> , ∆ˆ := θ], which we obtain from E and k : Γ C Γ 0 . Here, ∆ˆ := θ denotes a substitution that maps each variable in ∆ˆ to each term in θ.

$$\frac{\begin{array}{c} \mathcal{D} \\ \Gamma \vdash \mathsf{quo}(\hat{\Delta}) \mathsf{M} \mathrel{\mathop{:}} \begin{array}{c} \mathcal{L} \\ \end{array} \end{array}}{\begin{array}{c} \Gamma \vdash \mathsf{quo}(\hat{\Delta}) \mathsf{M} \mathrel{\mathop{:}} \begin{array}{c} \mathcal{L} \\ \end{array} \end{array}} \quad \begin{array}{c} \mathcal{E} \\ \end{array} \\ \begin{array}{c} \mathcal{E} \\ \end{array} \\ \begin{array}{c} \mathcal{D} \\ \end{array} \end{array} \quad \begin{array}{c} \mathcal{D} \\ \end{array} \begin{array}{c} \mathcal{D} \\ \end{array} \end{array} \quad \begin{array}{c} \mathcal{D}' \\ \end{array}$$

Local completeness states that the elimination rule is sufficiently strong. We can confirm this condition by the following local expansion (we assume that rg(∆ˆ) = C ).

D Γ ` M : [C ` T] ⇒ D Γ ` M : [C ` T] . . . Γ, µ, ∆ˆ ` dom(∆ˆ): C . . . 1: Γ C Γ, µ, ∆ˆ Γ, <sup>µ</sup>, <sup>∆</sup><sup>ˆ</sup> ` unq1<sup>M</sup> [dom(∆ˆ)]: <sup>T</sup> <sup>Γ</sup> ` quoh∆ˆiunq1<sup>M</sup> [dom(∆ˆ)]: [<sup>C</sup> ` <sup>T</sup>]

These patterns provide base cases for β-reduction and η-expansion. This paper focuses on β-reduction, which we define as follows.

Definition 1 (β-reduction). We inductively define full reduction relations on terms and explicit substitutions, →β. We show main rules other than congruence below. We also define →<sup>∗</sup> β as the reflexive transitive closure of →β.

$$\begin{array}{c} \overline{\left(\lambda x^{S}.M\right)N \to\_{\beta} M[x := N]} \end{array} \quad \begin{array}{c} \overline{\left(\mathsf{u}\mathsf{nq}\_{k}(\mathsf{q}\mathsf{u}\mathsf{o}/\overline{x^{\flat}}:C)M\right)[\theta] \to\_{\beta} M[\mathsf{f}\_{k},\overline{x^{\flat}}:= \theta]} $$

We safely omit identity substitutions found in these rules, thanks to Lemma 2. We do not dive into the basic properties of λ[] for now because we discuss those of its extension λ<sup>∀</sup>[] in Sections 4 and 5.

# 4 Polymorphic Contexts

This section proposes a novel type theory λ<sup>∀</sup>[] that extends λ[] with polymorphic contexts. We quickly go through an overview of its syntax and semantics, focusing on the differences from λ[]. As examples in Section 2, the critical idea of λ<sup>∀</sup>[] is the notion of series variables, which can be considered the term representation for context variables.

#### 4.1 Syntax, Type System, and Substitution

We provide the syntax of λ<sup>∀</sup>[] in Fig. 5. First, λ<sup>∀</sup>[] has two additional sorts of variables: context variables γ, δ, standing for contexts, and series variables **x**, **y**, representing sequences of variables. λ<sup>∀</sup>[] adds polymorphic context types of the Types S, T ::= . . . | ∀γ.T Contexts C, D ::= . . . | C, γ Terms M , N ::= · · · | Λγ.M | M @C Explicit Subst. θ ::= . . . | θ, **x** Named Contexts Γ, ∆ ::= . . . | Γ, **x** : γ

Fig. 5. Syntax of λ<sup>∀</sup>[]

form ∀γ.T, which binds γ in T. It represents the set of types obtained by substituting any context C for the context variable γ. Two kinds of terms Λγ.M and M @C are added as introduction and elimination for polymorphic context types. We allow C to include polymorphic context types; thus, polymorphism in λ<sup>∀</sup>[] is impredicative. The definition of contexts means that we can abstract any part of a context with context variables, e.g., ∀γ1.∀γ2.[γ1, ι, γ<sup>2</sup> ` ι]. Accordingly, series variables can appear in explicit substitutions, and a pair of a series variable and a context variable can appear in a named context. FV is updated to accommodate series variables but we omit the definition here.

It is worth noting that context variables are not subject to staging. This allows us to use the same context variable across levels—for example, the type ∀γ.[γ ` [γ ` T]] binds both occurrences of γ although they are in different levels. The definition of free context variables, denoted by FCV(−), is straightforward and we omit it in this paper.

We give additional typing rules and defining clauses of substitutions in Fig. 6. We also extend the auxiliary functions such as tail to accommodate the new syntax but we omit their definitions. The introduction and elimination rules for polymorphic context types are similar to those for the polymorphic types in System F [8]. The definition of context substitution T[γ := C ] for types is straightforward and omitted. The other rule for explicit substitutions states that we can add **x**: γ to an explicit substitution if it appears in the level-0 part of Γ. The point of the extension of substitution is that a series variable can only be replaced with another series variable, not an explicit substitution. With these extensions, we can confirm that the substitution lemma holds as expected.

#### 4.2 Context Substitution

We also define substitution for context variables, which is the most non-trivial part of λ<sup>∀</sup>[]. To describe the core idea of context substitution, let us consider a term quoh**x**: γi(unq1M [**x**]). If we naively substitute a context T, δ for the context variable γ in this term, we would obtain quoh**x**: (T, δ)i(unq1M [**x**]), where **x**: (T, δ) is simply ill formed as a named context. Instead, we will take the following steps.


Γ ` M : T Γ ` θ : C Γ ` M : T γ 6∈ FCV (Γ) Γ ` Λγ.M : ∀γ.T Γ ` M : ∀γ.T Γ ` M @C : T[γ := C] Γ ` θ : C **x** : γ ∈ tail(Γ) Γ ` θ, **x** : C, γ

Substitution σ ::= . . . | σ, **x** := **y**

M [σ] θ[σ] . . . (Λγ.M )[σ] = Λγ.(M [σ]) if γ 6∈ FCV (σ) (M @C)[σ] = (M [σ])@C . . . (θ, **<sup>x</sup>**)[σ] = ( θ[σ], **y** if **x** := **y** ∈ tail(σ) θ[σ], **x** else ` σ : ∆ ⇒ Γ ` σ : ∆ ⇒ Γ **y** : γ ∈ tail(Γ) **x** 6∈ dom (∆)

new variables x , **y** for T, δ. As a result, we get a variable series substitution **x** := x , **y**.


In this way, substitution for context variables essentially requires three operations (1) to replace context variables with contexts, (2) to generate fresh variables to be substituted for series variables, and (3) to replace series variables with sequences of variables. We start its formal definition with the following new objects. We write G<sup>v</sup> and G<sup>s</sup> for infinite sequences of ordinary variables and series variables without duplication, respectively.


A context substitution Σ maps context variables to contexts, and a variable series substitution σ¯ maps series variables to variable series, that is, sequences of ordinary/series variables. Note that series substitution does not affect stage levels; hence, locks in series substitution are not annotated with stage transitions. A variable generator consists of streams of non-duplicating variables and series variables. We use it to generate fresh variables. rg (¯σ) denotes the variable series obtained from the range of σ¯.

We define application of context substitution in Fig. 7. Application of a context substitution to types T[Σ] and contexts C [Σ] is straightforward; we simply replace context variables in a capture-avoiding manner. We omit their definitions from the figure. On the contrary, context substitution on terms M [Σ; ¯σ]<sup>G</sup> and explicit substitutions θ[Σ; ¯σ]<sup>G</sup> comes with not only Σ but also a variable series substitution σ¯ and a variable generator G. Σ is used to replace context variables in types in λ-abstractions and Γ in a quote; σ¯ is used to substitute series variables in explicit substitutions and Γ in a quote. The most interesting is the case for a quote quohΓˆiM : first, a variable series substitution σ¯ 0 is generated by the auxiliary function destruct (Step 2 above); second, Σ and the generated σ¯ <sup>0</sup> are applied to Γˆ to yield the new named context (Step 3); finally, we apply Σ and σ, ¯ µ, σ¯ 0 to the body of the quote (Step 4), after removing variables in dom (Γˆ) and generated ones from the generator; here, (Gv, Gs)−S means (G<sup>v</sup> \S, G<sup>s</sup> \S). The auxiliary function destructG(Γ, Σ) scans Γ to find context variables in the domain of Σ, generates fresh (ordinary/series) variables by using gensyms, and returns a variable series substitution. gensymsG(C , V ) produces a sequence of ordinary/series variables of the same length as C ; fresh variables are chosen from earlier ones in G but not in V .

For example, consider applying Σ = γ := T1, γ<sup>0</sup> and the empty variable series substitution to M = quoh**x**: γ, x : ι, **y**: γiM0. destructG((**x**: γ, x : ι, **y**: γ),(γ := T1, γ<sup>0</sup> )) returns **x** := (x 0 , **x** 0 ), **y** := (y 0 , **y** 0 ) for some fresh x 0 , **x** 0 , y 0 , and **y** 0 (with respect to G) and, thus, M [Σ; •]<sup>G</sup> is quohx 0 : T1, **x** 0 : γ 0 , x : ι, y 0 : T1, **y** 0 : γ 0 iM <sup>0</sup> <sup>0</sup> where M <sup>0</sup> <sup>0</sup> = M0[Σ; (•, µ, **x** := (x 0 , **x** 0 ), **y** := (y 0 , **y** 0 ))]G<sup>0</sup> and G<sup>0</sup> = G−{**x**, x , **y**, x 0 , **x** 0 , y 0 , **y** 0}. We can confirm that context substitution preserves derivable judgments.

Lemma 3 (Context Substitution Lemma).


Although we use variable generators to get fresh variables, the result of context substitution should be equivalent under renaming. We can confirm this intuition by the following lemma.

Lemma 4. If Γ ` M : T, σ¯<sup>1</sup> = destruct<sup>G</sup><sup>1</sup> (Γ, Σ) and σ¯<sup>2</sup> = destruct<sup>G</sup><sup>2</sup> (Γ, Σ), then there is a renaming substitution σ such that Γ[Σ; ¯σ1] ` M [Σ; ¯σ2]G<sup>0</sup> 1 [σ]: T[Σ] with some G<sup>0</sup> 1 .

Corollary 1. If dom (Σ) ∩ FCV (Γ) = ∅ and Γ ` M : T, then M [Σ; •]<sup>G</sup><sup>1</sup> =<sup>α</sup> M [Σ; •]<sup>G</sup><sup>2</sup> .

Based on this nature of context substitution, we may omit variable generators from context substitution applications.

M [Σ; ¯σ]<sup>G</sup>

x [Σ; ¯σ]<sup>G</sup> = x (λx T .M )[Σ; ¯σ]<sup>G</sup> = λx (T[Σ]) .(M [Σ; ¯σ]G) (M N )[Σ; ¯σ]<sup>G</sup> = (M [Σ; ¯σ]G) (N [Σ; ¯σ]G) (quohΓˆiM )[Σ; ¯σ]<sup>G</sup> = quohΓˆ[Σ; ¯σ 0 ]i(M [Σ; (¯σ, µ, σ¯ 0 )]G<sup>0</sup> ) where σ¯ <sup>0</sup> = destructG(Γ , Σ ˆ ) and G <sup>0</sup> = G − (dom (Γˆ) ∪ rg (¯σ 0 )) (unqkM [θ])[Σ; ¯σ]<sup>G</sup> = unq<sup>k</sup> (M [Σ; ¯σ ↑ k]G)[θ[Σ; ¯σ]G] (Λγ.M )[Σ; ¯σ]<sup>G</sup> = Λγ.(M [Σ; ¯σ]G) if γ 6∈ dom (Σ) and γ 6∈ FCV (Σ) (M @C)[Σ; ¯σ]<sup>G</sup> = (M [Σ; ¯σ]G)@(C[Σ]) θ[Σ; ¯σ]<sup>G</sup> •[Σ; ¯σ]<sup>G</sup> = • (θ, M )[Σ; ¯σ]<sup>G</sup> = (θ[Σ; ¯σ]G),(M [Σ; ¯σ]G) (θ, **x**)[Σ; ¯σ]<sup>G</sup> = (θ[Σ; ¯σ]G), −→y if **x** := −→<sup>y</sup> <sup>∈</sup> tail(¯σ) (θ[Σ; ¯σ]G), **x** otherwise Γ[Σ; ¯σ] •[Σ; ¯σ] = • (Γ, x : T)[Σ; ¯σ] = Γ[Σ; ¯σ], x : T[Σ] (Γ, **x** : γ)[Σ; ¯σ] = Γ[Σ; ¯σ], −→<sup>y</sup> : <sup>C</sup> if **x** := −→<sup>y</sup> <sup>∈</sup> tail(¯σ) and γ := C ∈ Σ Γ[Σ; ¯σ], **x** : γ else (Γ, µ)[Σ; ¯σ] = Γ[Σ; ¯σ ↑ 1], µ

#### Auxiliary functions

destructG((Γ, x : T), Σ) = destructG(Γ, Σ) destructG((Γ, **x** : γ), Σ) = σ, ¯ **x** := −→<sup>x</sup> if <sup>γ</sup> := <sup>C</sup> <sup>∈</sup> <sup>Σ</sup> where σ¯ = destructG(Γ, Σ) and −→<sup>x</sup> <sup>=</sup> gensymsG(C, dom (Γ) <sup>∪</sup> rg (¯σ)) destructG(Γ, Σ) otherwise destructG((Γ, µ), Σ) = destructG(Γ, Σ), µ gensyms(Gv,Gs) (•, V ) = • gensyms(Gv,Gs) ((C, T), V ) = gensyms(Gv,Gs) (C, V ∪ {x}), x where x is the first element of G<sup>v</sup> such that x 6∈ V gensyms(Gv,Gs) ((C, γ), V ) = gensyms(Gv,Gs) (C, V ∪ {**x**}), **x** where **x** is the first element of G<sup>s</sup> such that **x** 6∈ V

Fig. 7. Context substitutions and variable series substitutions

#### 4.3 Local Soundness and Completeness

Local soundness and local completeness are extended to polymorphic context types as follows. We use context substitution to obtain D<sup>0</sup> in the local reduction pattern. In this pattern, we observe destruct(Γ, γ := C ) = • because γ 6∈ FCV (Γ), and hence we get Γ ` M [γ := C ; •]: T[γ := C ]. For the local expansion pattern, we have to pick a context variable δ that is fresh against Γ.

Local Soundness

$$\frac{\begin{array}{c} \mathcal{D} \\ \hline \Gamma \vdash M \colon T \qquad \gamma \notin \mathsf{FCV}(\varGamma) \\ \hline \Gamma \vdash A\gamma.M \colon \forall \gamma.T \\ \hline \Gamma \vdash (A\gamma.M) \: \mathsf{@}C \colon T[\gamma := C] \end{array}}{\Longrightarrow \begin{array}{c} \mathcal{D}' \\ \hline \end{array} \implies \begin{array}{c} \mathcal{D}' \\ \hline \end{array} \colon T[\gamma := C] \end{array}} \frac{\mathcal{D}'}{\begin{array}{c} \mathcal{D}' \\ \hline \end{array} \colon T[\gamma := C] \end{array}}$$

Local Completeness

$$\begin{array}{c} \mathcal{D}'\\ \mathcal{D} \\ \Gamma \vdash M : \forall \gamma. T \end{array} \begin{array}{c} \mathcal{D}'\\ \begin{array}{c} \Gamma \vdash M : \forall \gamma. T\\ \hline \Gamma \vdash M \otimes \delta : T[\gamma : = \delta] \end{array} \quad \delta \notin \mathsf{FCV}(\Gamma)\\ \hline \Gamma \vdash A\delta. (M \otimes \delta) \colon \forall \delta. (T[\gamma := \delta]) \end{array}$$

As a result, we obtain an additional reduction rule for →<sup>β</sup> below.

$$\overline{(A\gamma.M)@C \to\_{\beta} M[\gamma := C; \bullet]}$$

By using the substitution and context substitution lemmas, it is not hard to show subject reduction with regard to this β-reduction.

#### Theorem 1 (Subject Reduction).


Furthermore, β-reduction satisfies strong normalization and confluence. We only refer to confluence here because we will prove strong normalization in the next section.

Theorem 2 (Confluence). If Γ ` M : T, M →<sup>∗</sup> <sup>β</sup> N<sup>1</sup> and M →<sup>∗</sup> <sup>β</sup> N2, then there exists a term N<sup>3</sup> such that N<sup>1</sup> →<sup>∗</sup> <sup>β</sup> N<sup>3</sup> and N<sup>2</sup> →<sup>∗</sup> <sup>β</sup> N3. The same holds also for well-typed explicit substitutions.

Proof. We use Newmann's lemma [25]. We have strong normalizaiton from Theorem 3 (in Section 5) and weak confluence is easy to show.

# 5 Parametric Reducibility and Strong Normalization

This section provides a proof of strong normalization of β-reduction in λ∀[]. A common approach to proving strong normalization of a modal calculus is to provide a reduction-preserving translation to another strongly normalizing calculus such as simply typed lambda calculi [15,1]. We tried this approach, reducing strong normalization of λ∀[] to that of System F [8]. However, it turned out not to be straightforward. Instead, we directly prove strong normalization of λ∀[] using reducibility in this paper. We follow Girard's parametric reducibility [8] to define reducibility with polymorphic contexts. We also adopted techniques from logical relation for Fitch-style modal calculi proposed by Valliappan et al. [31] to extend reducibility to our Fitch-style modal type theory. Along with these existing methods, our approach requires several non-trivial extensions of reducibility for contextual modal types, which we detail in this section.

We start with the definition of neutral terms and explicit substitutions.

#### Definition 2 (Neutral Terms and Explicit Substitutions).


• is neutral θ is neutral M is neutral θ, M is neutral θ is neutral θ, **x** is neutral

The definition of neutral terms is standard, while the one for neutral explicit substitutions is somewhat specific to λ<sup>∀</sup>[] but straightforward: θ is neutral iff all terms in θ are neutral. Then, we define reducibility candidates.

Definition 3 (Reducibility Candidates). Given a type T, let R be a set of derivable judgments of type T. We write R(Γ, M ) iff Γ ` M : T ∈ R. R is a reducibility candidate of T iff it satisfies all of the following properties.

CR0 If R(Γ, M ) and Γ ≤ Γ 0 , then R(Γ 0 , M ).


We also define a reducibility candidate of context C similarly.

We abbreviate reducibility candidate as RC. As a next step, we define reducibility candidate assignments to define reducibility with parameters. We only need to care about reducibility candidates of contexts because λ<sup>∀</sup>[] does not have polymorphic types.

RC assignment Σ˜ ::= • | Σ, γ ˜ : C := R (where R is an RC of C )

Σ˜ is well-formed if it does not have duplicating context variables in it. We assume that all reducibility candidate assignments are well-formed. We write dom (Σ˜) for the set of context variables on the left side of := in Σ˜, and Σ for the context substitution that we can obtain by forgetting RCs in Σ˜.

On top of that, we define reducibility with parameters.

Definition 4 (Parametric Reducibility). Given an RC assignment Σ˜, a type T, and a context C where FCV (T) ⊆ dom (Σ˜) and FCV (C ) ⊆ dom (Σ˜), we define Red<sup>T</sup> [Σ˜] and Red<sup>C</sup> [Σ˜], a set of derivable judgments of a type T[Σ] and a context C [Σ], respectively, as follows. We write Red<sup>T</sup> [Σ˜](Γ, M ) iff Γ ` M : T[Σ] ∈ Red<sup>T</sup> [Σ˜]; similarly for Red<sup>C</sup> [Σ˜](Γ, θ).


The definition for context variables is somewhat complicated. As (C 0 , γ)[Σ] = C 0 [Σ], D, we need two reducible explicit substitutions θ<sup>1</sup> and θ<sup>2</sup> where θ<sup>1</sup> is for C 0 [Σ] and θ<sup>2</sup> for D. Because D comes from the context variable γ, we use the RC R from Σ˜ to confirm that θ<sup>2</sup> is reducible.

The parametric reducibility is a reducibility candidate in fact, stated as the following lemma.

Lemma 5. 1. Red<sup>T</sup> [Σ˜] is an RC of T. 2. Red<sup>C</sup> [Σ˜] is an RC of C .

We prove a few more auxiliary lemmas for the basic lemma. Firstly, we confirm that context substitution on types or context can be lifted to reducibility assignment.

$$\begin{array}{ll}\textbf{Lemma 6.} & 1.\ \textbf{Red}\_{T[\gamma:=C]}[\tilde{\Sigma}] = \textbf{Red}\_{T}[\tilde{\Sigma},\gamma \colon C[\Sigma] := \textbf{Red}\_{C}[\tilde{\Sigma}]].\\ & 2.\ \textbf{Red}\_{D[\gamma:=C]}[\tilde{\Sigma}] = \textbf{Red}\_{D}[\tilde{\Sigma},\gamma \colon C[\Sigma] := \textbf{Red}\_{C}[\tilde{\Sigma}]].\end{array}$$

Besides, we state three lemmas that correspond to introduction of function types, contextual modal types, and polymorphic context types.

Lemma 7. If Γ, x : S[Σ] ` M : T[Σ] and Red<sup>T</sup> [Σ˜](Γ 0 , M [id<sup>Γ</sup> , x := N ]) for any Γ <sup>0</sup> and N such that Γ ≤ Γ <sup>0</sup> and Red<sup>S</sup> [Σ˜](Γ 0 , N ), then RedS→<sup>T</sup> [Σ˜](Γ, λx S .M ). Lemma 8. If Γ, µ, −→<sup>x</sup> : <sup>C</sup> [Σ] ` <sup>M</sup> : <sup>T</sup>[Σ] and Red<sup>T</sup> [Σ˜](Γ2, <sup>M</sup> [idΓ<sup>1</sup> , µ<sup>k</sup> , −→<sup>x</sup> := θ]) for any Γ1, Γ2, k and θ such that Γ ≤ Γ1, k : Γ<sup>1</sup> C Γ<sup>2</sup> and Red<sup>C</sup> [Σ˜](Γ2, θ), then Red[<sup>C</sup> `T] [Σ˜](Γ, quoh −→<sup>x</sup> : <sup>C</sup> [Σ]i<sup>M</sup> ).

Lemma 9. If Γ ` M : T[Σ], γ 6∈ FCV (Γ) ∪ FCV (Σ) ∪ dom (Σ), and Red<sup>T</sup> [Σ, γ ˜ : C := R](Γ, M [γ := C ; •]) for any C , R such that R is an RC of C , then Red∀γ.<sup>T</sup> [Σ˜](Γ, Λγ.M ).

We can prove these lemmas by CR3 and induction on the number of reduction steps of strongly normalizing terms/explicit substitutions.

Before the basic lemma, we define reducibility for named contexts. Although we would like something like Red<sup>Γ</sup> [Σ˜], this definition does not work because it does not have information on how a named context with series variable **x**: γ will be replaced. Therefore we also need to pass series variables substitution, like Red<sup>Γ</sup> [Σ, ˜ σ¯] in the same way as context substitution for named contexts.

Definition 5 (Reducibility for Substitution). Given an RC assignment Σ˜, a named context Γ, and a series substitution σ¯ where FCV (Γ) ⊆ dom (Σ˜), we define Red<sup>Γ</sup> [Σ, ˜ σ¯], a set of derivable judgments of a named context ` σ : Γ[Σ; ¯σ] ⇒ ∆, as follows. We write Red<sup>Γ</sup> [Σ, ˜ σ¯](∆, σ) iff ` σ : ∆ ⇒ Γ ∈ Red<sup>Γ</sup> [Σ, ˜ σ¯].


We use series variables substitution in the third rule to generate a substitution for (**x**: <sup>γ</sup>)[Σ; ¯σ] = −→<sup>x</sup> : <sup>C</sup> . Finally, we prove the basic lemma.

#### Lemma 10 (Basic Lemma).


Strong normalization is proved as a special case of the basic lemma, where we choose Σ, σ¯ and σ <sup>0</sup> as identity substitutions respectively.

Theorem 3 (Strong Normalization). If Γ ` M : T, then M is strongly normalizing with regard to →β.

Level-0 Types T 0 , S<sup>0</sup> := ι | S <sup>0</sup> → T 0 | T 1 Level-0 Terms M<sup>0</sup> , N<sup>0</sup> := x | λx T 0 .M<sup>0</sup> | M<sup>0</sup> N 0 | quoM<sup>1</sup> Level-1 Types T 1 , S<sup>1</sup> := ι | S <sup>1</sup> → T 1 Level-1 Terms M<sup>1</sup> , N<sup>1</sup> := x | λx T 1 .M<sup>1</sup> | M<sup>1</sup> N 1 | unqM<sup>0</sup> Named Contexts Γ ◦ , ∆◦ := · | Γ ◦ , x : <sup>0</sup> T 0 | Γ ◦ , x : <sup>1</sup> T 1 Γ ◦ `<sup>i</sup> M<sup>i</sup> : T i (i ∈ {0, 1}) x : <sup>i</sup> T <sup>i</sup> ∈ Γ ◦ Γ ◦ `<sup>i</sup> x : T i Γ ◦ , x : <sup>i</sup> T i <sup>1</sup> `<sup>i</sup> M<sup>i</sup> : T i 2 Γ ◦ `<sup>i</sup> λx T i <sup>1</sup> .M<sup>i</sup> : T i <sup>1</sup> → T i 2 Γ ◦ `<sup>i</sup> M<sup>i</sup> : T i <sup>1</sup> → T i <sup>2</sup> Γ ◦ `<sup>i</sup> N i : T i 1 Γ ◦ `<sup>i</sup> M<sup>i</sup> N i : T i 2 Γ ◦ `<sup>1</sup> M<sup>1</sup> : T 1 Γ ◦ `<sup>0</sup> quoM<sup>1</sup> : T 1 Γ ◦ `<sup>0</sup> M<sup>0</sup> : T 1 Γ ◦ `<sup>1</sup> unqM<sup>0</sup> : T 1

Fig. 8. Syntax and typing rules of λ (two-level fragment)

# 6 Embedding Linear-Time Temporal Type Theory

In multi-stage computation, contextual modal types are known to overcome weak points of linear-time temporal types from λ by Davies [5], regarding type safety of mutable reference cells and/or run-time code evaluation [12,24,14]. However, simple contextual modal theories, such as λ[], are known to be less expressive than linear-time temporal types. That is why polymorphic contexts are explored in the literature, which will endow expressiveness to contextual modal types. Then it is natural to ask if polymorphic contexts are strong enough to express linear-time temporal types. This section proves that the answer is yes, by providing a sound translation from linear-time temporal types to λ<sup>∀</sup>[]. We first define a two-level fragment of λ , as a source language to simplify our embedding (Fig. 8). We call the fragment itself λ later in this paper. Then, we discuss the core insights of our embedding from λ and give a formal definition of our embedding from λ to λ<sup>∀</sup>[]. We also prove its soundness—the embedding preserves typing—while a proof that it also preserves semantics is left for future work.

λ has two stages: level-1 is the future stage. We define types and terms for each level (and metavariables are indexed by 0 or 1). A temporal type T 1 denotes a code for the future-stage value of T 1 . Unlike contextual modal types, temporal types do not show context explicitly. Instead, typing judgments hold future-stage named contexts that implicitly represent contexts of those code types. A type judgment Γ ◦ `<sup>i</sup> M<sup>i</sup> : T i (where i = 0, 1) means typing at the stage i, where Γ ◦ includes variables of both levels. λ also has syntax for quote and unquote as in λ<sup>∀</sup>[] but they are not annotated with named contexts and explicit substitutions. Typing rules do little with named contexts.

These differences lead to the difference in binding structure. For example, consider a λ -term λf <sup>T</sup> 1 <sup>1</sup> → T 1 <sup>2</sup> .quo(λx T 1 <sup>1</sup> .unq(f quox )). In this term, the outer lambda binds the level-0 occurrence of f and the inner lambda binds the level-1 occurrence of x , although quo and unq are placed between binders and variable references. To embed λ to λ∀[], we have to emulate this behavior of λ .

We design our embedding from λ to λ∀[] based on the following insights. First of all, we naturally embed quote and unquote of λ to those of λ∀[] (by recovering missing annotations). Secondly, we can recover a hidden context of code types in λ from the types of level-1 free variables. For example, in the judgment

$$x: ^{0}\text{Ĉint}, y: ^{1}\text{int} \vdash\_{0} \mathtt{quo} y: \bigcup \text{int},$$

the context of the type int (of quoy) should be int because the named context has a level-1 binding y : 1 int. As a result, int under x : <sup>0</sup> int, y : 1 int is embedded into [int ` int]. Thirdly, recovered contexts of code types sometimes need to be extended. Let us consider the following judgment:

$$\cdot \vdash\_0 \lambda f^{\langle \text{Jint} \rightarrow \text{\ranglestr}}. \mathsf{quo}(\lambda x^{\text{int}}. \mathsf{unq}(f \,\mathsf{quo} x)) : (\mathsf{Öint} \rightarrow \mathsf{Östr}) \rightarrow \mathsf{Ö}(\mathsf{int} \rightarrow \mathsf{str}).$$

The hidden context of the f is empty, and hence the type of f should be [• ` int] → [• ` str]. However, f is used inside the level-1 binder λx int, and hence this use of f should be typed as [int ` str] → [int ` str]. We need to extend the context of the code type as an abstraction under quo extends the level-1 context. Thus, the polymorphic context type ∀γ.[γ ` int] → [γ ` str] is more appropriate for f. In this way, polymorphic contexts allow us to extend the context of an argument of code type, according to where the argument is used.

The formal definition of our embedding is shown in Figure 9. Level-1 types are translated to λ<sup>∀</sup>[] types in a straightforward manner; the translation of level-0 types carries a context, which is used to signify the context of code types. If it translates a function type, we introduce a polymorphic context type to the argument type so that we can extend the context of the type later. For example, ( int → str) → (int → str) translates to (∀γ.(∀δ.[γ, δ ` int]) → [γ ` str]) → [• ` int → str] under an empty context.

Before discussing term translation, we introduce intermediate named contexts Γ˜, an intermediate representation of embedded named contexts. Their structure is similar to named contexts in λ while its elements are variables and types of λ<sup>∀</sup>[]. We write |Γ˜|<sup>0</sup> for the level-0 fragment of Γ˜ and |Γ˜|<sup>1</sup> for the level-1 fragment of Γ˜. The relation Γ ◦ Γ˜ means that Γ ◦ can be translated into Γ˜. The point is that Γ ◦ can be translated into different intermediate named contexts. For example, the λ named context x : <sup>1</sup> T 1 , y : <sup>0</sup> S 1 , z : <sup>0</sup> S 1 can be translated to both x : 1 JT 1 <sup>K</sup>, <sup>y</sup> : 0 [JT 1 <sup>K</sup> ` <sup>J</sup><sup>S</sup> 1 <sup>K</sup>], z : 0 [JT 1 <sup>K</sup> ` <sup>J</sup><sup>S</sup> 1 <sup>K</sup>] and x : 1 JT 1 <sup>K</sup>, <sup>y</sup> : 0 [JT 1 <sup>K</sup> ` <sup>J</sup><sup>S</sup> 1 <sup>K</sup>], **<sup>x</sup>** : <sup>1</sup> γ, z : 0 [JT 1 <sup>K</sup>, γ ` <sup>J</sup><sup>S</sup> 1 <sup>K</sup>] due to the last rule of . We use this relation to prove the soundness theorem (Theorem 4) later.

Term embedding carries an intermediate named context for two purposes. Firstly, it is used to infer a named context and an explicit substitution for quote and unquote. Secondly, it is used to know a missing context that we need to extend when using level-0 variables. The level-1 types in a named context always translate to polymorphic context types so that we can extend their context when those variables are used. diff (x , Γ˜) determines the missing context, defined as diff (x ,(Γ , ˜ x : <sup>0</sup> T, ∆˜)) = rg(|∆˜|1) (or undefined otherwise).

JT 1 <sup>K</sup> <sup>J</sup>M<sup>1</sup> KΓ˜ <sup>J</sup>ι<sup>K</sup> <sup>=</sup> <sup>ι</sup> JT 1 <sup>1</sup> → T 1 <sup>2</sup> <sup>K</sup> <sup>=</sup> <sup>J</sup><sup>T</sup> 1 <sup>1</sup> <sup>K</sup> <sup>→</sup> <sup>J</sup><sup>T</sup> 1 2 K <sup>J</sup><sup>x</sup> <sup>K</sup><sup>Γ</sup>˜ <sup>=</sup> <sup>x</sup> Jλx T 1 .M<sup>1</sup> <sup>K</sup><sup>Γ</sup>˜ <sup>=</sup> <sup>λ</sup><sup>x</sup> JT 1K .JM<sup>1</sup> KΓ , ˜ <sup>x</sup>: 1<sup>J</sup><sup>T</sup> <sup>1</sup><sup>K</sup> <sup>J</sup>M<sup>1</sup> <sup>N</sup> 1 <sup>K</sup><sup>Γ</sup>˜ <sup>=</sup> <sup>J</sup>M<sup>1</sup> <sup>K</sup><sup>Γ</sup>˜ <sup>J</sup><sup>N</sup> 1 KΓ˜ JunqM<sup>0</sup> <sup>K</sup><sup>Γ</sup>˜ <sup>=</sup> unq<sup>1</sup> JM<sup>0</sup> <sup>K</sup><sup>Γ</sup>˜[dom(|Γ˜|1)] JT 0 <sup>K</sup><sup>C</sup> <sup>J</sup>M<sup>0</sup> KΓ˜ <sup>J</sup>ιK<sup>C</sup> <sup>=</sup> <sup>ι</sup> JT 0 <sup>1</sup> → T 0 <sup>2</sup> <sup>K</sup><sup>C</sup> = (∀γ.J<sup>T</sup> 0 <sup>1</sup> <sup>K</sup><sup>C</sup> ,γ) <sup>→</sup> <sup>J</sup><sup>T</sup> 0 <sup>2</sup> K<sup>C</sup> for fresh γ <sup>J</sup> <sup>T</sup> 1 <sup>K</sup><sup>C</sup> = [<sup>C</sup> ` <sup>J</sup><sup>T</sup> 1 K] <sup>J</sup><sup>x</sup> <sup>K</sup><sup>Γ</sup>˜ <sup>=</sup> <sup>x</sup>@diff (<sup>x</sup> , <sup>Γ</sup>˜) Jλx T 0 .M<sup>0</sup> <sup>K</sup><sup>Γ</sup>˜ <sup>=</sup> <sup>λ</sup><sup>x</sup> T .JM<sup>0</sup> KΓ , ˜ <sup>x</sup>: 0T where <sup>T</sup> <sup>=</sup> <sup>∀</sup>γ.J<sup>T</sup> 0 Krg(|Γ˜|1),γ for fresh γ <sup>J</sup>M<sup>0</sup> <sup>N</sup> 0 <sup>K</sup><sup>Γ</sup>˜ <sup>=</sup> <sup>J</sup>M<sup>1</sup> <sup>K</sup><sup>Γ</sup>˜ (Λγ.J<sup>N</sup> 1 KΓ , ˜ **<sup>x</sup>**: 1γ ) for a fresh **x** and γ JquoM<sup>1</sup> <sup>K</sup><sup>Γ</sup>˜ <sup>=</sup> quoh|Γ˜|1iJM<sup>1</sup> KΓ˜ Intermediate Named Context Γ˜ := · | Γ , ˜ x : <sup>0</sup> T | Γ , ˜ x : <sup>1</sup> T | Γ , ˜ **x** : 1 γ

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \Gamma^{\diamondsuit} \sim \bar{\Gamma} \end{array} \end{array} \quad \begin{array}{c} \begin{array}{c} \Gamma^{\diamondsuit} \sim \bar{\Gamma} \end{array} \end{array} \quad \begin{array}{c} \Gamma^{\diamondsuit} \sim \bar{\Gamma} \\\hline \begin{array}{c} \Gamma^{\diamondsuit}, x: ^{0}T^{0} \sim \bar{\Gamma}, x: ^{0}\forall \gamma. \left[ \left. T^{0} \right|\_{\mathsf{q}\left( \left[ \bar{\Gamma} \right]\_{1} \right), \gamma} \right| \end{array} \end{array} \end{array}$$

$$\begin{array}{c} \begin{array}{c} \Gamma^{\diamondsuit} \sim \bar{\Gamma} \\\hline \begin{array}{c} \Gamma^{\diamondsuit}, x: ^{1}T^{1} \sim \bar{\Gamma}, x: ^{1}\left[ \left[ \left. T^{1} \right] \right| \end{array} \end{array} \quad \begin{array}{c} \Gamma^{\diamondsuit} \sim \bar{\Gamma} \\\hline \begin{array}{c} \Gamma^{\diamondsuit} \sim \bar{\Gamma}, \ast : ^{1}\gamma \end{array} \end{array} \end{array}$$

Finally, we prove the soundness of the translation.

#### Theorem 4 (Soundness of Embedding from λ ).

– If Γ ◦ `<sup>0</sup> M<sup>0</sup> : T <sup>0</sup> and Γ ◦ <sup>Γ</sup>˜, then <sup>|</sup>Γ˜|<sup>0</sup> ` <sup>J</sup>M<sup>0</sup> <sup>K</sup><sup>Γ</sup>˜ : <sup>J</sup><sup>T</sup> 0 K rg(|Γ˜|1) . – If Γ ◦ `<sup>1</sup> M<sup>1</sup> : T <sup>1</sup> and Γ ◦ <sup>Γ</sup>˜, then <sup>|</sup>Γ˜|0, <sup>µ</sup>, <sup>|</sup>Γ˜|<sup>1</sup> ` <sup>J</sup>M<sup>1</sup> <sup>K</sup><sup>Γ</sup>˜ : <sup>J</sup><sup>T</sup> 1 K.

Proof (Sketch). By mutual induction on derivation of λ .

We focus on the case of level-0 application. If M<sup>0</sup> = M<sup>0</sup> <sup>1</sup> M<sup>0</sup> 2 , then Γ ◦ `<sup>0</sup> M<sup>0</sup> 1 : S <sup>0</sup> → T <sup>0</sup> and Γ ◦ `<sup>0</sup> M<sup>0</sup> 2 : S 0 for some S 0 . By the induction hypothesis, we have the two λ<sup>∀</sup>[] judgments below.

$$\begin{array}{lcl} - & \| \tilde{I} \|\_{0} \vdash & \| M\_{1}^{0} \|\_{\varGamma} \colon (\forall \gamma. \| S^{0} \|\_{\operatorname{rg}(\| \tilde{I} \|\_{1}), \gamma}) \to \, \| T^{0} \|\_{\operatorname{rg}(\| \tilde{I} \|\_{1})} \\ - & \| \tilde{I} , \bowtie : ^{1} \gamma \vert\_{0} \vdash & \| M\_{2}^{0} \|\_{\varGamma, \operatorname{xc}^{1} \gamma} \colon \| S^{0} \|\_{\operatorname{rg}(\| \tilde{I} \|\_{\operatorname{rg}(\| \tilde{I} \|\_{\operatorname{rg}})})} \end{array}$$

The second judgment holds because Γ ◦ Γ , ˜ **x** : <sup>1</sup> γ can be derived from Γ ◦ Γ˜. We can derive <sup>|</sup>Γ˜|<sup>0</sup> ` Λγ.JM<sup>0</sup> 2 KΓ , ˜ **<sup>x</sup>**: <sup>1</sup>γ : <sup>∀</sup>γ.J<sup>S</sup> 0 K rg(|Γ , ˜ **x**: <sup>1</sup>γ|1) from the second judgment considering that |Γ , ˜ **x** : <sup>1</sup> γ|<sup>0</sup> = |Γ˜|0. Then we can apply this judgment to the first judgment, and we obtain <sup>|</sup>Γ˜|<sup>0</sup> ` <sup>J</sup>M<sup>0</sup> 1 <sup>K</sup><sup>Γ</sup>˜ (Λγ.JM<sup>0</sup> 2 KΓ , ˜ **<sup>x</sup>**: <sup>1</sup>γ ): <sup>J</sup><sup>T</sup> 0 K rg(|Γ˜|1) .

It is worth noting that this embedding requires multiple occurrences of context variables in a single context: As we have seen, ( int → str) → (int → str) translates to (∀γ.(∀δ.[γ, δ ` int]) → [γ ` str]) → [• ` int → str], where the type [γ, δ ` int] uses two context variables. This fact strongly suggests that context variables in λ∀[] are essential for embedding linear-time temporal types and hence also staged computation.

# 7 Related Work

Contextual Modal Type Theory. Early work on calculi for metaprogramming with explicit contexts include λ poly open by Kim et al. [12] and ν by Nanevski and Pfenning [16]. On the one hand, λ poly open has a Fitch-style-like modal type system with explicit contexts and is type safe in the presence of mutable reference and run-time evaluation. On the other hand, ν has a dual-context-like modal type system that is type sound with run-time evaluation. Both calculi use symbolic representation for named contexts of quoted code. As a result, names in quoted code are not subject to α-conversion. It is worth noting that both papers discuss context polymorphism to achieve flexibility for computation with contexts.

Nanevski et al. refined ν to contextual modal type theory (CMTT) [17], allowing α-conversion for variables in quoted code. CMTT is very close to our λ[] while it employs dual-context style formulation. We believe it is not difficult to apply polymorphic context types to dual-context CMTT, although we do not explore it in this paper. CMTT provides a basis for several metaprogramming languages [9,20,26]. We expect that λ<sup>∀</sup>[] will contribute to future designs of metaprogramming languages as well.

One notable difference between CMTT and λ[] is that CMTT has a named context inside a contextual modal type, instead of an (unnamed) context. This approach makes α-conversion somewhat complicated: a CMTT term box(x: T.x) has a type [x: T]T while an α-equivalent term box(y : T.y) has a bit different type [y : T]T. Instead, λ[] omits names from contexts in contextual modal types by identifying variables in a context by their positions; hence α-equivalent terms always have the same type in λ[].

Prior Work on Polymorphic Contexts. Contextual modal type systems have been applied to proof assistants [20,3,21,26]. Those proof assistants are designed to allow users to inspect code representation of proof terms using contextual modal types. In particular, Beluga [20,3] allows users to perform pattern match against code with polymorphic contexts, whereas λ<sup>∀</sup>[] allows only for generative metaprogramming. The prior proposals used an identity substitution id<sup>φ</sup> as a term representation of a context variable φ, whereas we use series variables for that purpose. Type-theoretic formalization of identity substitutions is examined by Puech's unpublished work [23]. He proposed dual-context and Fitch-style contextual modal type theories with polymorphic types and identity substitutions. However, a formalization with identity substitutions introduces a significant restriction: only one occurrence of context variable is allowed in a single context. Suppose we allow multiple occurrences of context

variables in a context with identity substitutions. In that case, we have a term like quohγ, γi(unq(x)[idγ]) that is ill-scoped because we do not know which γ is referred to by idγ. One might consider introducing a restriction that context variable do not duplicate in a context. However, it is still hard to avoid ill-scoped terms like (Λδ.quohγ, δi(unq(x)[idγ]))@γ, which reduces to the previous term. That is why we introduce series variables in λ∀[].

Context Subtyping. Rhiger [24] proposed a Fitch-style contextual modal type system λ [] <sup>&</sup>lt; that achieves safe code operation with mutable reference and runtime evaluation. An interesting point of λ [] <sup>&</sup>lt; is that it employs linear-time flavored named contexts where a quote does not discard a future-stage context, and achieves flexibility of computation with context by introducing structural subtyping for contexts. Kiselyov et al. proposed a type system <NJ> with a notion of refined environment classifiers [14], which can be interpreted as encapsulated representation of contexts. <NJ> is similar to λ [] <sup>&</sup>lt; in the sense that it employs classifier subtyping while it is closer to nominal subtyping. They suggested bounded polymorphism over classifiers as potential extension of <NJ>, which will allow a type like ∀γ.(∀δ γ.hT1i <sup>δ</sup> → hT2i δ ) → hT<sup>1</sup> → T2i γ . Their bounded polymorphism is likely as expressive as polymorphic contexts of λ<sup>∀</sup>[], and we are interested in the formal relation between them.

Pattern matching against code Analytic metaprogramming that allows pattern matching against code values is considered beneficial and explored recently [18,28,9]. Especially, Mœbius [9] provides a contextual modal type system capable of pattern matching against open code with polymorphic types. It should be feasible to extend λ<sup>∀</sup>[] to allow pattern matching against code values, but it is left for future work.

Modal Types for Algebraic Effects and Handlers. ECMTT [32] is an interesting application of contextual modal types to algebraic effects and handlers [22]. It uses contexts to track effects of computations and use explicit substitutions to supply effect handlers. The authors mentioned that ECMTT needs some form of context polymorphism to support effect polymorphism. We expect the polymorphic context types in λ<sup>∀</sup>[] will provide a basis for such an extension. As our formulation allows multiple occurrences of context variables; hence, we can describe a function that combines computations with different polymorphic effects, e.g., ∀γ, δ.[γ ` T] → [δ ` T] → [γ, δ ` T].

Linear-Time Temporal Types. There are several attempts at revealing the relation between contextual modal type theory and linear-time temporal type theory. However, not all of them achieved their goal. For example, Davies [5] pointed out that the translation from λ poly open to λ , proposed by Kim et al. [12], was not sound for some cases. Puech [23] also claimed a sound translation from λ ctx I to λ <sup>α</sup> [29], which is an extension of λ with environment classifiers, but it did not work for some cases, either. His translation infers hidden contexts by introducing logic variables for unknown contexts and collecting constraints on those logic variables through typing derivations. Consequently, the following judgment fails to translate because f is used in two different scopes, and hence contradicting constraints for f is generated.

$$\begin{aligned} f: ^0\bigcirc T &\to \bigcirc T, g: ^0\bigcirc T \to \bigcirc T \to \bigcirc T, z: ^1T \\ \vdash g \text{ (quo}((\lambda x : T.\mathbf{unq}(f\mathbf{quo} x))z)(f\mathbf{quo} z)) \end{aligned}$$

These failing translations conversely indicate that the hypothesis by Davies [5] is right: a sound translation from λ requires a full form of context polymorphism as in our λ<sup>∀</sup>[]. Kameyama et al. [10] provided a sound translation from a 2-level fragment of λ <sup>α</sup> to System F with products and a fixed point operator. Their translation uses polymorphic types to represent unknown contexts, similarly to our approach. However, their translation takes an approach different from ours. For example, a λ type T → T → T is encoded to ∀γ.([γ ` T] → ∀δ.([γ, δ ` T] → [γ, δ ` T])) if we apply their approach to λ<sup>∀</sup>[], whereas the same type is encoded to (∀γ.[γ ` T]) → (∀γ.[γ ` T]) → [• ` T] by the approach discussed in Section 6. There are two major differences between their approach and ours. Firstly, their translation needs to insert coercion functions that extend contexts in types in conjunction with polymorphic types. On the contrary, our approach achieves the same goal purely by polymorphic contexts, making the translation much more concise. Secondly, their source language supports richer expressions than λ , including run-time evaluation and fixpoint. It is left for future work to figure out whether our approach can also embed such features of λ <sup>α</sup> to λ<sup>∀</sup>[].

# 8 Conclusion

This paper has proposed a novel contextual modal type theory λ<sup>∀</sup>[] with polymorphic contexts. It is novel in that it supports parametric polymorphic contexts and allows us to have multiple context variables in a single context. We have given its semantics by β-reduction and proved subject reduction, strong normalization, and confluence. We have also demonstrated sound embedding from linear-time temporal type theory. We expect that this result shows that λ<sup>∀</sup>[] endows expressiveness sufficient to describe programs with staged computation.

We regard this work as a first step to establishing a mature modal type theory that reasons hygienic binding operations provided by procedural macros of Scheme, Racket, and several languages. Future work includes formal reasoning of the relation between contextual modal types and refined environment classifiers and developing contextual modal type theory that can express first-class variable names.

Acknowledgements We would like to express gratitude to anonymous referees for their constructive feedback. This work is supported in part by JSPS KAKENHI Grant Number JP20H00582 and JST, the establishment of university fellowships towards the creation of science technology innovation, Grant Number JPMJFS2123.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# A Complete Inference System for Skip-free Guarded Kleene Algebra with Tests

Todd Schmid<sup>1</sup> , Tobias Kapp´e2,<sup>3</sup> () , and Alexandra Silva<sup>4</sup>

<sup>1</sup> University College London, London, UK

<sup>2</sup> Open University of the Netherlands, Heerlen, The Netherlands tobias.kappe@ou.nl

3 ILLC, University of Amsterdam, Amsterdam, The Netherlands <sup>4</sup> Cornell University, Ithaca, NY, USA

Abstract. Guarded Kleene Algebra with Tests (GKAT) is a fragment of Kleene Algebra with Tests (KAT) that was recently introduced to reason efficiently about imperative programs. In contrast to KAT, GKAT does not have an algebraic axiomatization, but relies on an analogue of Salomaa's axiomatization of Kleene Algebra. In this paper, we present an algebraic axiomatization and prove two completeness results for a large fragment of GKAT consisting of skip-free programs.

# 1 Introduction

Kleene algebra with tests (KAT) [25] is a logic for reasoning about semantics and equivalence of simple imperative programs. It extends Kleene Algebra (KA) with Boolean control flow, which enables encoding of conditionals and while loops.

KAT has been applied to verification tasks. For example, it was used in proofcarrying Java programs [23], in compiler optimization [27], and file systems [8]. More recently, KAT was used for reasoning about packet-switched networks, serving as a core to NetKAT [4] and Probabilistic NetKAT [12,43].

The success of KAT in networking is partly due to its dual nature: it can be used to both specify and verify network properties. Moreover, the implementations of NetKAT and ProbNetKAT were surprisingly competitive with state-ofthe-art tools [13,44]. Part of the surprise with the efficiency of these implementations is that the decision problem for equivalence in both KAT and NetKAT is PSPACE-complete [28,4]. Further investigations [42] revealed that the tasks performed in NetKAT only make use of a fragment of KAT. It turns out that the difficulty of deciding equivalence in KAT can largely be attributed to the non-deterministic nature of KAT programs. If one restricts to KAT programs that operate deterministically with respect to Boolean control flow, the associated decision problem is almost linear. This fragment of KAT was first identified in [29] and further explored as guarded Kleene algebra with tests (GKAT) [42].

The study in [42] proved that the decision problem for GKAT programs is almost linear, and proposed an axiomatization of equivalence. However, the axiomatization suffered from a serious drawback: it included a powerful uniqueness of solutions axiom (UA), which greatly encumbers algebraic reasoning in practice. In order to use (UA) to show that a pair of programs are equivalent, one needs to find a system of equations that they both satisfy. Even more worryingly, the axiomatization contained a fixed-point axiom with a side condition reminiscent of Salomaa's axiomatization for regular expressions, which is known to be non-algebraic and impair the use of the axiomatic reasoning in context (as substitution of atomic programs is not sound anymore). The authors of [42] left as open questions whether (UA) can be derived from the other GKAT axioms and whether the non-algebraic side condition can be removed. Despite the attention GKAT has received in recent literature [39,48,41], these questions remain open.

In the present work, we offer a partial answer to the questions posed in [42]. We show that proving the validity of an equivalence in GKAT does not require (UA) if the pair of programs in question are of a particular form, what we call skip-free. This fragment of GKAT is expressive enough to capture a large class of programs, and it also provides a better basis for algebraic reasoning: we show that the side condition of the fixed-point axiom can be removed. Our inspiration to look at this fragment came from recent work of Grabmayer and Fokkink's on the axiomatization of 1-free star expressions modulo bisimulation [15,14], an important stepping stone to solving a decades-open problem posed by Milner [32].

In a nutshell, our contribution is to identify a large fragment of GKAT, what we call the skip-free fragment, that admits an algebraic axiomatization. We axiomatize both bisimilarity and language semantics and provide two completeness proofs. The first proves completeness of skip-free GKAT modulo bisimulation [39], via a reduction to completeness of Grabmayer and Fokkink's system [15]. The second proves completeness of skip-free GKAT w.r.t. language semantics via a reduction to skip-free GKAT modulo bisimulation. We also show that equivalence proofs of skip-free GKAT expressions (for both semantics) embed in full GKAT.

The next section contains an introduction to GKAT and an overview of the open problems we tackle in the technical sections of the paper.

# 2 Overview

In this section we provide an overview of our results. We start with a motivating example of two imperative programs to discuss program equivalence as a verification technology. We then show how GKAT can be used to solve this problem and explore the open questions that we tackle in this paper.

Equivalence for Verification. In the game Fizz! Buzz! [35], players sit in a circle taking turns counting up from one. Instead of saying any number that is a multiple of 3, players must say "fizz", and multiples of 5 are replaced with "buzz". If the number is a multiple both 3 and 5, the player must say "fizz buzz".

Imagine you are asked in a job interview to write a program that prints out the first 100 rounds of a perfect game of Fizz! Buzz!. You write the function fizzbuzz1 as given in Figure 1(i). Thinking about the interview later that day, you look up a solution, and you find fizzbuzz2, depicted in Figure 1(ii). You

Fig. 1. Two possible specifications of the ideal Fizz! Buzz! player.

suspect that fizzbuzz2 should do the same thing as fizzbuzz1, and after thinking it over for a few minutes, you realize your program could be transformed into the reference solution by a series of transformations that do not change its semantics:


Feeling somewhat more reassured, you ponder the three steps above. It seems like their validity is independent of the actual tests and actions performed by the code; for example, swapping the branches of an if - then - else - block while negating the test should be valid under any circumstances. This raises the question: is there a family of primitive transformations that can be used to derive valid ways of rearranging imperative programs? Furthermore, is there an algorithm to decide whether two programs are equivalent under these laws?

Enter GKAT. Guarded Kleene Algebra with Tests (GKAT) [42] has been proposed as a way of answering the questions above. Expressions in the language of GKAT model skeletons of imperative programs, where the exact meaning of tests and actions is abstracted. The laws of GKAT correspond to program transformations that are valid regardless of the semantics of tests and actions.

Formally, GKAT expressions are captured by a two-level grammar, generated by a finite set of tests T and a finite set of actions Σ, as follows:

$$\begin{aligned} \mathsf{BExp} & \ni b, c ::= 0 \mid 1 \mid t \in T \mid b \lor c \mid b \land c \mid \overline{b} \\ \mathsf{GExp} & \ni e, f ::= p \in \Sigma \mid b \mid e +\_b f \mid e \cdot f \mid e^{(b)} \end{aligned}$$

BExp is the set of Boolean expressions, built from 0 (false), 1 (true), and primitive tests from T, and composed using ∨ (or), ∧ (and) and (not). GExp is the set of GKAT expressions, built from tests (assert statements) and primitive actions p ∈ Σ. Here, e +<sup>b</sup> f is a condensed way of writing 'if b then e else f', and e (b) is shorthand for 'while b do e'; the operator · models sequential composition. By convention, the sequence operator · takes precedence over the operator +b.

Example 2.1. Abbreviating statements of the form print foo by simply writing foo, Figure 1(i) can be rendered as the GKAT expression

$$(n := 1) \cdot \begin{pmatrix} \left( \text{fizz} \cdot n ++ + \frac{1}{5|n|} \text{fizzbuzz} \cdot n ++ \right) + \mathbf{}\_{3|n|} \\ \left( \text{buzz} \cdot n ++ + \mathbf{}\_{5|n} \ n \cdot n + \text{++} \right) \end{pmatrix} \stackrel{(n \le 100)}{\cdot} \cdot \text{done!} \tag{1}$$

Similarly, the program in Figure 1(ii) gives the GKAT expression

$$\mathbf{x}(n:=1) \cdot \left( \left( \text{fizzbuz} \,\middle|\, \begin{aligned} & \text{fizz} \,\middle|\, \begin{aligned} & \text{fizz} \,\middle|\, \begin{aligned} & \text{fizz} \,\middle|\, \begin{array}{c} & \text{fizz} \end{array} \end{aligned} \right) \right) \cdot \begin{aligned} & \text{n}++} \end{aligned} \right) \begin{aligned} \text{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet{\textbullet}}}}}}}}}}}}}}}} }} } } } } \end} } \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \end} \} \end} \} \} \} \} \} \} \} \}}$$

Semantics. A moment ago, we stated that GKAT equivalences are intended to witness program equivalence, regardless of how primitive tests and actions are interpreted. We make this more precise by recalling the relational semantics of GKAT programs [42].<sup>5</sup> The intuition behind this semantics is that if the possible states of the machine being programmed are modelled by some set S, then tests are predicates on S (comprised of all states where the test succeeds), and actions are relations on S (encoding the changes in state affected by the action).

Definition 2.2 ([42]). A (relational) interpretation is a triple σ = (S, eval,sat) where S is a set, eval : Σ → P(S × S) and sat : T → P(S). Each relational interpretation <sup>σ</sup> gives rise to a semantics <sup>J</sup>−K<sup>σ</sup> : GExp → P(<sup>S</sup> <sup>×</sup> <sup>S</sup>), as follows:

$$\begin{aligned} [0]\_{\sigma} &= \emptyset & [\overline{a}]\_{\sigma} &= [1]\_{\sigma} \vee [a]\_{\sigma} \\ [1]\_{\sigma} &= \{(s,s) : s \in S\} & [p]\_{\sigma} &= \mathsf{eval}(p) \\ [t]\_{\sigma} &= \{(s,s) : s \in \mathsf{sat}(t)\} & [e+\_{b}f]\_{\sigma} &= [b]\_{\sigma} \circ [e]\_{\sigma} \cup [\overline{b}]\_{\sigma} \circ [f]\_{\sigma} \\ [b \wedge c]\_{\sigma} &= [b]\_{\sigma} \cap [c]\_{\sigma} & [e \cdot f]\_{\sigma} &= [e]\_{\sigma} \circ [f]\_{\sigma} \\ [b \vee c]\_{\sigma} &= [b]\_{\sigma} \cup [c]\_{\sigma} & [e^{(b)}]\_{\sigma} &= ([b]\_{\sigma} \circ [e]\_{\sigma})^{\*} \circ [\overline{b}]\_{\sigma} \end{aligned}$$

Here we use ◦ for relation composition and <sup>∗</sup> for reflexive transitive closure.

Remark 2.3. If eval(p) is a partial function for every <sup>p</sup> <sup>∈</sup> <sup>Σ</sup>, then so is <sup>J</sup>eK<sup>σ</sup> for each e. The above therefore also yields a semantics in terms of partial functions.

The relation <sup>J</sup>eK<sup>σ</sup> contains the possible pairs of start and end states of the program <sup>e</sup>. For instance, the input-output relation of <sup>J</sup><sup>e</sup> <sup>+</sup><sup>b</sup> <sup>f</sup><sup>K</sup> consists of the pairs in <sup>J</sup>eK<sup>σ</sup> (resp. <sup>J</sup>fK<sup>σ</sup>) where the start state satisfies <sup>b</sup> (resp. violates <sup>b</sup>).

Example 2.4. We could model the states of the machine running Fizz! Buzz! as pairs (m, `), where m is the current value of the counter n, and ` is a list of words printed so far; the accompanying maps sat and eval are given by:

$$\begin{aligned} \mathsf{sat}(k|n) &= \{ (m, \ell) \in S : m \equiv 0 \text{ mod } k \} \\ \mathsf{sat}(n \le k) &= \{ (m, \ell) \in S : m \le k \} \\ \mathsf{eval}(n \wedge \cdot) &= \{ ((m, \ell), (m+1, \ell) : (m, \ell) \in S \} \\ \mathsf{eval}(n \cdot k) &= \{ ((m, \ell), (k, \ell)) : (m, \ell) \in S \} \\ \mathsf{eval}(\omega) &= \{ ((m, \ell), (m, \ell \bullet)) : (m, \ell) \in S \} \qquad \langle \le \{ \text{fixz}, \text{bzz}, \text{fizzbzuz} \} \rangle \\ \mathsf{eval}(n) &= \{ ((m, \ell), (m, \ell m)) : (m, \ell) \in S \} \end{aligned}$$

<sup>5</sup> A probabilistic semantics in terms of sub-Markov kernels is also possible [42].

For instance, the interpretation of n++ connects states of the form (m, `) to states of the form (m + 1, `)—incrementing the counter by one, and leaving the output unchanged. Similarly, print statements append the given string to the output.

On the one hand, this parameterized semantics shows that programs in the GKAT syntax can be given a semantics that corresponds to the intended meaning of their actions and tests. On the other hand, it allows us to quantify over all possible interpretations, and thus abstract from the meaning of the primitives.

As it happens, two expressions have the same relational semantics under any interpretation if and only if they have the same language semantics [42], i.e., in terms of languages of guarded strings as used in KAT [25]. Since equivalence under the language semantics is efficiently decidable [42], so is equivalence under all relational interpretations. The decision procedure in [42] uses bisimulation and known results from automata theory. These techniques are good for mechanization but hide the algebraic structure of programs that plays. To expose this, algebraic laws of GKAT program equivalence were studied.

Program transformations. GKAT programs are (generalized) regular expressions, which are intuitive to reason about and for which many syntactic equivalences are known and explored. In [42], a set of sound axioms e ≡ f such that <sup>J</sup>eK<sup>σ</sup> <sup>=</sup> <sup>J</sup>fK<sup>σ</sup> for all <sup>σ</sup> was proposed, and it was shown that these can be used to prove a number of useful facts about programs. For instance, the following two equivalences are axioms of GKAT:

$$e \cdot g +\_b f \cdot g \equiv (e +\_b f) \cdot g \qquad \qquad f +\_{\overline{b}} e \equiv e +\_b f$$

The first of these says that common code at the tail end of branches can be factored out, while the second says that the code in branches of a conditional can be swapped, as long as we negate the test. Returning to our running example, if we apply the first law to (1) three times (once for each guarded choice),

$$\mathbf{x}(n:=1)\cdot \left( \begin{pmatrix} \text{(fizzbuzz +\_{5|n} fizz) +\_{3|n} \\ \text{(buzz +\_{5|n} n)} \end{pmatrix} \cdot n++ \right) ^{(n \le 100)} \cdot done! \tag{3}$$

Finally, we can apply (e +<sup>b</sup> f) +<sup>c</sup> (g +<sup>b</sup> h) ≡ e +b∧<sup>c</sup> (f +<sup>c</sup> (g +<sup>b</sup> h)), which is provable from the axioms of GKAT, to transform (3) into (2).

Being able to transform one GKAT program into another using the axioms of GKAT is useful, but the question arises: do the axioms capture all equivalences that hold? More specifically, are the axioms of GKAT powerful enough to prove that <sup>e</sup> <sup>≡</sup> <sup>f</sup> whenever <sup>J</sup>eK<sup>σ</sup> <sup>=</sup> <sup>J</sup>fK<sup>σ</sup> holds for all <sup>σ</sup>?

In [42], a partial answer to the above question is provided: if we extend the laws of GKAT with the uniqueness axiom (UA), then the resulting set of axioms is sound and complete w.r.t. the language semantics. The problem with this is that (UA) is not really a single axiom, but rather an axiom scheme, which makes both its presentation and application somewhat unwieldy.

To properly introduce (UA), we need the following notion.

Definition 2.5. A left-affine system is defined by expressions e11, . . . , enn ∈ GExp and f1, . . . , f<sup>n</sup> ∈ GExp, along with tests b11, . . . , bnn ∈ BExp. A sequence of expressions s1, . . . , s<sup>n</sup> ∈ GExp is said to be a solution to this system if

$$s\_i \equiv e\_{i1} \cdot s\_1 +\_{b\_{i1}} e\_{i2} \cdot s\_2 +\_{b\_{i2}} \cdots +\_{b\_{i(n-1)}} e\_{in} +\_{b\_{in}} f\_i \quad (\forall i \le n),$$

Here, the operations +bij associate to the right.

A left-affine system is called guarded if no eij that appears in the system successfully terminates after reading an atomic test. In other words, each coefficient denotes a productive program, meaning it must execute some action before successfully terminating—we refer to Section 7.3 for more details.

Stated fully, (UA) says that if expressions s1, . . . , s<sup>n</sup> and t1, . . . , t<sup>n</sup> are solutions to the same guarded left-affine system, then s<sup>i</sup> ≡ t<sup>i</sup> for 1 ≤ i ≤ n.

On top of the infinitary nature of (UA), the side condition demanding guardedness prevents purely algebraic reasoning: replacing action symbols in a valid GKAT equation with arbitrary GKAT expressions might yield an invalid equation! The situation is analogous to the empty word property used by Salomaa [37] to axiomatize equivalence of regular expressions. The side condition of guardedness appearing in (UA) is inherited from another axiom of GKAT, the fixed-point axiom, which in essence is the unary version of this axiom scheme and explicitly defines the solution of one guarded left-affine equation as a while loop.

> g ≡ eg +<sup>b</sup> f =⇒ g ≡ e (b) f if e is guarded.

Remark 2.6. Part of the problem of the uniqueness axiom is that the case for general n does not seem to follow easily from the case where n = 1. The problem here is that, unlike the analogous situation for Kleene algebra, there is no general method to transform a left-affine system with n + 1 unknowns into one with n unknowns [29], even if this is possible in certain cases [42].

The open questions. We are motivated by two open questions from [42]:


This paper. Our main contribution is to show that, in a particular fragment of GKAT, both questions can be answered in the positive (see Figure 2).

In Section 3, we present what we call the skip-free fragment of GKAT, consisting of programs that do not contain assert statements in the body (other than assert false); in other words, Boolean statements are restricted to control statements. For this fragment, we show that the axiom scheme (UA) can be avoided


Fig. 2. Axioms for language semantics skip-free GKAT (in addition to Boolean algebra axioms for tests, see Fig. 3). If the axiom marked † is omitted the above axiomatize a finer semantics, bisimilarity.

entirely. In fact, this is true for language semantics (as first introduced in [42]) as well as for the bisimulation semantics of [39].

In Section 4, we provide a bridge to a recent result in process algebra. In the 80s, Milner offered an alternative interpretation of regular expressions [32], as what he called star behaviours. Based on work of Salomaa from the 1960s [37], Milner proposed a sound axiomatization of the algebra of star behaviours, but left completeness an open problem. After 38 years, it was recently solved by Clemens Grabmayer [14] following up on his joint work with Wan Fokkink showing that a suitable restriction of Milner's axioms is complete for the one-free fragment of regular expressions modulo bisimulation [15]. We leverage their work with an interesting embedding of skip-free GKAT into the one-free regular expressions.

This leads to two completeness results. In Section 5, we start by focusing on the bisimulation semantics of the skip-free fragment, and then in Section 6 expand our argument to its language semantics. More precisely, we first provide a reduction of the completeness of skip-free GKAT up to bisimulation to the completeness of Grabmayer and Fokkink's 1-free regular expressions modulo bisimulation [15]. We then provide a reduction of the completeness of skip-free GKAT modulo language semantics to the completeness of skip-free GKAT modulo bisimulation via a technique inspired by the tree pruning approach of [39].

Finally, in Section 7, we connect our semantics of skip-free GKAT expressions to the established semantics of full GKAT. We also connect the syntactic proofs between skip-free GKAT expressions in both our axiomatization and the existing one. In conjunction with the results of Sections 5 and 6, the results in Section 7 make a significant step towards answering the question of whether the axioms of GKAT give a complete description of program equivalence, in the positive.

Proofs appear in the full version [40].

# 3 Introducing Skip-free GKAT

The axiom scheme (UA) can be avoided entirely in a certain fragment of GKAT, both for determining bisimilarity and language equivalence. In this section, we give a formal description of the expressions in this fragment and their semantics.

Skip-free expressions. The fragment of GKAT in focus is the one that excludes sub-programs that may accept immediately, without performing any action. Since these programs can be "skipped" under certain conditions, we call the fragment


Fig. 3. The axioms of Boolean algebra [18].

that avoids them skip-free. Among others, it prohibits sub-programs of the form assert b for b 6= false, but also while false do p, which is equivalent to assert true.

Definition 3.1. Given a set Σ of atomic actions, the set GExp<sup>−</sup> of skip-free GKAT expressions is given by the grammar

$$\mathsf{GExp}^- \ni e\_1, e\_2 ::= 0 \mid p \in \Sigma \mid e\_1 +\_b e\_2 \mid e\_1 \cdot e\_2 \mid e\_1^{(b)} e\_2 \mid$$

where b ranges over the Boolean algebra expressions BExp.

Unlike full GKAT, in skip-free GKAT the loop construct is treated as a binary operation, analogous to Kleene's original star operation [22], which was also binary. This helps us avoid loops of the form e (b) , which can be skipped when b does not hold. The expression e (b) 1 e<sup>2</sup> corresponds to e (b) 1 · e<sup>2</sup> in GKAT.

Example 3.2. Using the same notational shorthand as in Example 2.1, the block of code in Figure 1(ii) can be cast as the skip-free GKAT expression

(n := 1) · ((fizzbuzz +3|n∧5|<sup>n</sup> (fizz +3|<sup>n</sup> (buzz +5|<sup>n</sup> n))) · n++) (n ≤ 100) (done!)

Note how we use a skip-free loop of the form e<sup>1</sup> (b) e<sup>2</sup> instead of the looping construct e (b) <sup>1</sup> before concatenating with e2, as was done for GKAT.

# 3.1 Skip-free Semantics

There are three natural ways to interpret skip-free GKAT expressions: as automata, as behaviours, and as languages. <sup>6</sup> After a short note on Boolean algebra, we shall begin with the automaton interpretation, also known as the small-step semantics, from which the other two can be derived.

Boolean algebra. To properly present our automata, we need to introduce one more notion. Boolean expressions BExp are a syntax for elements of a Boolean algebra, an algebraic structure satisfying the equations in Fig. 3. When a Boolean algebra is freely generated from a finite set of basic tests (T in the case of BExp), it has a finite set At of nonzero minimal elements called atoms. Atoms are in one-to-one correspondence with sets of tests, and the Boolean algebra is isomorphic to P(At), the sets of subsets of At, equipped with ∨ = ∪, ∧ = ∩, and (−) = At \ (−). In the context of programming, one can think of an atom as a complete description of the machine state, saying which tests are true and which are false. We will denote atoms by the Greek letters α and β, sometimes with indices. Given a Boolean expression b ∈ BExp and an atom α ∈ At we say that α entails b, written α ≤ b, whenever α ∨ b = 1, or equivalently α ∨ b = b.

<sup>6</sup> We will connect these to the relational semantics from Definition 2.2 in Section 7.


Fig. 4. The small-step semantics of skip-free GKAT expressions.

Automata. Throughout the paper, we use the notation • + S where S is a set and • is a symbol to denote the disjoint union (coproduct) of {•} and S.

The small-step semantics of a skip-free GKAT expression uses a special type of deterministic automaton. A skip-free automaton is a pair (X, h), where X is a set of states and h: X → (⊥ + Σ × (X + X))At is a transition structure. At every x ∈ X and for any α ∈ At, one of three things can happen:


Definition 3.3 (Automaton of expressions). We equip the set GExp<sup>−</sup> of all skip-free GKAT expressions with an automaton structure (GExp<sup>−</sup>, ∂) given in Fig. 4, representing step-by-step execution. Given e ∈ GExp<sup>−</sup>, we denote the set of states reachable from e by hei and call this the small-step semantics of e.

The small-step semantics of skip-free GKAT expressions is inspired by Brzozowski's derivatives [7], which provide an automata-theoretic description of the step-by-step execution of a regular expression. Our first lemma tells us that, like regular expressions, skip-free GKAT expressions correspond to finite automata.

Lemma 3.4. For any e ∈ GExp<sup>−</sup>, hei has finitely many states.

Example 3.5. The automaton that arises from the program fizzbuzz2 is below, with a = n ≤ 100, b = 3|n, and c = 5|n. The expression e is the same as in Example 3.2, e<sup>1</sup> is the same as e but without the action n := 0 in front, and e<sup>2</sup> = n++ · e1. We also adopt the convention of writing x b|p −−→ x <sup>0</sup> where b ∈ BExp to represent all transitions x α|p −−→ x <sup>0</sup> where α ≤ b.

The automaton interpretation of a skip-free GKAT expression (its small-step semantics) provides an intuitive visual depiction of the details of its execution. This is a useful view on the operational semantics of expressions, but sometimes one might want to have a more precise description of the global behaviour of the program. The remaining two interpretations of skip-free GKAT expressions aim to capture two denotational semantics of expressions: one finer, bisimilarity, that makes a distinction on the branching created by how its states respond to atomic tests, which actions can be performed, and when successful termination and crashes occur; another coarser, language semantics, that assigns a language of traces to each expression capturing all sequences of actions that lead to successful termination. The key difference between these two semantics is their ability to distinguish programs that crash early in the execution versus programs that crash later—this is evident in the axiomatizations of both semantics. We start by presenting the language semantics as this is the more traditional one associated with GKAT (and regular) expressions.

Language semantics. Formally, a (skip-free) guarded trace is a nonempty string of the form α1p<sup>1</sup> · · · αnpn, where each α<sup>i</sup> ∈ At and p<sup>i</sup> ∈ Σ. Intuitively, each α<sup>i</sup> captures the state of program variables needed to execute program action p<sup>i</sup> and the execution of each p<sup>i</sup> except the last yields a new program state αi+1. A skip-free guarded language is a set of guarded traces.

Skip-free guarded languages should be thought of as sets of strings denoting successfully terminating computations.

Definition 3.6 (Language acceptance). In a skip-free automaton (X, h) with a state x ∈ X, the language accepted by x is the skip-free guarded language

$$\mathcal{L}(x,(X,h)) = \{ \alpha\_1 p\_1 \cdots \alpha\_n p\_n \mid x \xrightarrow{\alpha\_1 \mid p\_1} x\_1 \to \cdots \to x\_n \xrightarrow{\alpha\_n \mid p\_n} \sqrt{} \}$$

If (X, h) is clear from context, we will simply write L(x) instead of L(x,(X, h)). If L(x) = L(y), we write x ∼<sup>L</sup> y and say that x and y are language equivalent.

Each skip-free GKAT expression is a state in the automaton of expressions (Definition 3.3) and therefore accepts a language. The language accepted by a skip-free GKAT expression is the set of successful runs of the program it denotes. Analogously to GKAT, we can describe this language inductively.

Lemma 3.7. Given an expression e ∈ GExp<sup>−</sup>, the language accepted by e in (GExp<sup>−</sup>, ∂), i.e., L(e) = L(e,(GExp<sup>−</sup>, ∂)) can be characterized as follows:

$$\begin{aligned} \mathcal{L}(0) = \emptyset \quad \mathcal{L}(p) = \{\alpha p \mid \alpha \in \mathsf{At} \} \quad \mathcal{L}(e\_1 +\_b e\_2) = b\mathcal{L}(e\_1) \cup \bar{b}\mathcal{L}(e\_2), \\ \mathcal{L}(e\_1 \cdot e\_2) = \mathcal{L}(e\_1) \cdot \mathcal{L}(e\_2) \quad \mathcal{L}(e\_1^{(b)} e\_2) = \bigcup\_{n \in \mathbb{N}} (b\mathcal{L}(e\_1))^n \cdot \bar{b}\mathcal{L}(e\_2) \end{aligned}$$

Here, we write bL = {αpw ∈ L | α ≤ b} and L<sup>1</sup> · L<sup>2</sup> = {wx : w ∈ L1, x ∈ L2}, while L <sup>0</sup> = {} (where denotes the empty word) and L <sup>n</sup>+1 = L · L n.

Lemma 3.7 provides a way of computing the language of an expression e without having to generate the automaton for e.

Bisimulation semantics. Another, finer, notion of equivalence that we can associate with skip-free automata is bisimilarity.

Definition 3.8. Given skip-free automata (X, h) and (Y, k), a bisimulation is a relation R ⊆ X × Y such that for any x R y, α ∈ At and p ∈ Σ:


We call x and y bisimilar if x R y for some bisimulation R and write x ↔ y.

In a fixed skip-free automaton (X, h), we define ↔ ⊆ X × X to be the largest bisimulation, called bisimilarity. This is an equivalence relation and a bisimulation.<sup>7</sup> The bisimilarity equivalence class of a state is often called its behaviour.

Example 3.9. In the automaton below, x<sup>1</sup> and x<sup>2</sup> are bisimilar. This is witnessed by the bisimulation {(x1, x2),(x2, x2)}.

$$\underbrace{\subset\_{\boldsymbol{x}\_1}}\_{\bar{a}\mid\boldsymbol{q}} \xleftarrow{a\mid\boldsymbol{p}}\_{\boldsymbol{\lambda}} \xleftarrow{\boldsymbol{p}}\_{\bar{a}\mid\boldsymbol{q}} \xleftarrow{\boldsymbol{x}\_2} \xleftarrow{\boldsymbol{p}} \xleftarrow{\boldsymbol{a}\mid\boldsymbol{p}}$$

We can also use bisimulations to witness language equivalence.

Lemma 3.10. Let e1, e<sup>2</sup> ∈ GExp<sup>−</sup>. If e<sup>1</sup> ↔ e2, then L(e1) = L(e2).

The converse of Lemma 3.10 is not true. Consider, for example, the program p (1)q that repeats the atomic action p ∈ Σ indefinitely, never reaching q. Since

$$\mathcal{L}(p^{(1)}q) = \bigcup\_{n \in \mathbb{N}} \mathcal{L}(p)^n \cdot \emptyset = \emptyset = \mathcal{L}(0)$$

we know that p (1)q ∼<sup>L</sup> 0. But p (1)q and 0 are not bisimilar, since Fig. 4 tells us that p (1)q α|p −−→ p (1)q and 0 ↓ α, which together refute Definition 3.8.1.

#### 3.2 Axioms

Next, we give an inference system for bisimilarity and language equivalence consisting of equations and equational inference rules. The axioms of skip-free GKAT are given in Fig. 2. They include the equation (†), which says that early deadlock is the same as late deadlock. This is sound with respect to the language interpretation, meaning that (†) is true if x is replaced with a skip-free guarded language, but it is not sound with respect to the bisimulation semantics. For example, the expressions p · 0 and 0 are not bisimilar for any p ∈ Σ. Interestingly, this is the only axiomatic difference between bisimilarity and language equivalence.

<sup>7</sup> This follows directly from seeing skip-free automata as a special type of coalgebra and the fact that the functor involved preserves weak pullbacks [36]. In fact, coalgebra has been an indispensable tool in the production of the current paper, guiding us to the correct definitions and simplifying many of the proofs.

Remark 3.11. The underlying logical structure of our inference systems is equational logic [5], meaning that provable equivalence is an equivalence relation that is preserved by the algebraic operations.

Given expressions e1, e<sup>2</sup> ∈ GExp−, we write e<sup>1</sup> ≡† e<sup>2</sup> and say that e<sup>1</sup> and e<sup>2</sup> are ≡†-equivalent if the equation e<sup>1</sup> = e<sup>2</sup> can be derived from the axioms in Fig. 2 without the axiom marked (†). We write e<sup>1</sup> ≡ e<sup>2</sup> and say that e<sup>1</sup> and e<sup>2</sup> are ≡-equivalent if e<sup>1</sup> = e<sup>2</sup> can be derived from the whole set of axioms in Fig. 2.

The axioms in Fig. 2 are sound with respect to the respective semantics they axiomatize. The only axiom that is not sound w.r.t. bisimilarity is x · 0 ≡ 0, as this would relate automata with different behaviours (x may permit some action to be performed, and this is observable in the bisimulation).

# Theorem 3.12 (Soundness). For any e1, e<sup>2</sup> ∈ GExp<sup>−</sup>,

```
1. If e1 ≡† e2, then e1 ↔ e2.
```
2. If e<sup>1</sup> ≡ e2, then e<sup>1</sup> ∼<sup>L</sup> e2.

We consider the next two results, which are jointly converse to Theorem 3.12, to be the main theorems of this paper. They state that the axioms in Fig. 2 are complete for bisimilarity and language equivalence respectively, i.e., they describe a complete set of program transformations for skip-free GKAT.

Theorem 3.13 (Completeness I). If e<sup>1</sup> ↔ e2, then e<sup>1</sup> ≡† e2.

Theorem 3.14 (Completeness II). If e<sup>1</sup> ∼<sup>L</sup> e2, then e<sup>1</sup> ≡ e2.

We prove Theorem 3.13 in Section 5 by drawing a formal analogy between skip-free GKAT and a recent study of regular expressions in the context of process algebra [15]. We include a short overview of this recent work in the next section.

We delay the proof of Theorem 3.14 to Section 6, which uses a separate technique based on the pruning method introduced in [39].

# 4 1-free Star Expressions

Regular expressions were introduced by Kleene [22] as a syntax for the algebra of regular events. Milner offered an alternative interpretation of regular expressions [32], as what he called star behaviours. Based on work of Salomaa [37], Milner proposed a sound axiomatization of the algebra of star behaviours, but left completeness an open problem. After nearly 40 years of active research from the process algebra community, a solution was finally found by Grabmayer [14].

A few years before this result, Grabmayer and Fokkink proved that a suitable restriction of Milner's axioms gives a complete inference system for the behaviour interpretation of a fragment of regular expressions, called the onefree fragment [15]. In this section, we give a quick overview of Grabmayer and Fokkink's one-free fragment [15], slightly adapted to use an alphabet that will be suitable to later use in one of the completeness proofs of skip-free GKAT.

$$\begin{array}{ccccc}\hline r\_{1} \stackrel{\alpha p}{\longrightarrow} \stackrel{\alpha p}{\longrightarrow} & r\_{1} + r\_{2} \stackrel{\alpha p}{\longrightarrow} \stackrel{\alpha p}{\longrightarrow} r' & \quad \frac{r\_{2} \stackrel{\alpha p}{\longrightarrow} r'}{r\_{1} + r\_{2} \stackrel{\alpha p}{\longrightarrow} \stackrel{\alpha p}{\longrightarrow} r'} & \quad \frac{r\_{1} \stackrel{\alpha p}{\longrightarrow} r'}{r\_{1}r\_{2} \stackrel{\alpha p}{\longrightarrow} \stackrel{\alpha p}{\longrightarrow} r'r\_{2}}\\\hline r\_{1} \stackrel{\alpha p}{\longrightarrow} \stackrel{\alpha p}{\longrightarrow} r\_{2} & \quad r\_{1} \stackrel{\alpha p}{\longrightarrow} r' & \quad \frac{r\_{1} \stackrel{\alpha p}{\longrightarrow} \stackrel{\alpha p}{\longrightarrow} r'}{r\_{1} \stackrel{\alpha p}{\longrightarrow} \stackrel{\alpha p}{\longrightarrow} r\_{1} \stackrel{\alpha p}{\longrightarrow} r\_{2}} & \quad \frac{r\_{2} \stackrel{\alpha p}{\longrightarrow} x}{r\_{1} \stackrel{\alpha p}{\longrightarrow} r\_{2} \stackrel{\alpha p}{\longrightarrow}} x\\\hline\end{array}$$

Fig. 5. The small-step semantics of one-free star expressions.

Syntax. In the process algebra literature [32,15,14], regular expressions generated by a fixed alphabet A are called star expressions, and denote labelled transition systems (LTSs) with labels drawn from A. As was mentioned in Section 3, skip-free automata can be seen as certain LTSs where the labels are atomic test/atomic action pairs. In Section 5, we encode skip-free GKAT expressions as one-free regular expressions and skip-free automata as LTSs with labels drawn from At · Σ. We instantiate the construction from [15] of the set of star expressions generated by the label set At · Σ.

Definition 4.1. The set StExp of one-free star expressions is given by

$$\mathsf{StExp} \ni r\_1, r\_2 ::= 0 \mid \alpha p \in \mathsf{At} \cdot \Sigma \mid r\_1 + r\_2 \mid r\_1 r\_2 \mid r\_1 \* r\_2 \rangle$$

Semantics. The semantics of StExp is now an instance of the labelled transition systems that originally appeared in [15], with atomic test/atomic action pairs as labels and a (synthetic) output state X denoting successful termination.

For the rest of this paper, we call a pair (S, t) a labelled transition system when S is a set of states and t: S → P(At·Σ ×(X+S)) is a transition structure. We write x αp −−→ y if (αp, y) ∈ t(x) and x αp −−→ X if (αp, X) ∈ t(x).

The set StExp can be given the structure of a labelled transition system (StExp, τ ), defined in Fig. 5. If r ∈ StExp, we write hri for the transition system obtained by restricting τ to the one-free star expressions reachable from r and call hri the small-step semantics of r.

The bisimulation interpretation of one-free star expressions is subtler than the bisimulation interpretation of skip-free GKAT expressions. The issue is that labelled transition systems (LTSs) are nondeterministic in general: it is possible for an LTS to have both a x αp −−→ y and a x αq −→ z transition for p 6= q or y 6= z. The appropriate notion of bisimilarity for LTSs can be given as follows.

Definition 4.2. Given labelled transition systems (S, t) and (T, u), a bisimulation between them is a relation R ⊆ S × T s.t. for any x R y and αp ∈ At · Σ,

αp

0 , and


As before, we denote the largest bisimulation by ↔. We call x and y bisimilar and write x ↔ y if x R y for some bisimulation R.


Fig. 6. Axioms for equivalence for one-free star expressions.

The following closure properties of bisimulations of LTSs are useful later. They also imply that bisimilarity is an equivalence relation. Like in the skip-free case, the bisimilarity equivalence class of a state is called its behaviour.

Lemma 4.3. Let (S, t), (T, u), and (U, v) be labelled transition systems. Furthermore, let R1, R<sup>2</sup> ⊆ S × T and R<sup>3</sup> ⊆ T × U be bisimulations. Then R op <sup>1</sup> = {(y, x) | x R<sup>1</sup> y}, R<sup>1</sup> ∪ R<sup>2</sup> and R<sup>1</sup> ◦ R<sup>3</sup> are bisimulations.

Axiomatization. We follow [15], where it was shown that the axiomatization found in Fig. 6 is complete with respect to bisimilarity for one-free star expressions. Given a pair r1, r<sup>2</sup> ∈ StExp, we write r<sup>1</sup> ≡<sup>∗</sup> r<sup>2</sup> and say that r<sup>1</sup> and r<sup>2</sup> are ≡∗-equivalent if the equation r<sup>1</sup> = r<sup>2</sup> can be derived from the axioms in Fig. 6.

The following result is crucial to the next section, where we prove that the axioms of ≡† are complete with respect to bisimilarity in skip-free GKAT.

Theorem 4.4 ([15, Theorem. 7.1]). r<sup>1</sup> ↔ r<sup>2</sup> if and only if r<sup>1</sup> ≡<sup>∗</sup> r2.

# 5 Completeness for Skip-free Bisimulation GKAT

This section is dedicated to the proof of our first completeness result, Theorem 3.13, which says that the axioms of Fig. 2 (excluding †) are complete with respect to bisimilarity in skip-free GKAT. Our proof strategy is a reduction of our completeness result to the completeness result for StExp (Theorem 4.4).

The key objects of interest in the reduction are a pair of translations: one translation turns skip-free GKAT expressions into one-free star expressions and maintains bisimilarity, and the other translation turns (certain) one-free star expressions into skip-free GKAT expressions and maintains provable bisimilarity.

We first discuss the translation between automata and labelled transition systems, which preserves and reflects bisimilarity. We then introduce the syntactic translations and present the completeness proof.

#### 5.1 Transforming skip-free automata to labelled transition systems

We can easily transform a skip-free automaton into an LTS by essentially turning α|p −−→ transitions into αp −−→ transitions. This can be formalized, as follows.

Definition 5.1. Given a set X, we define grph<sup>X</sup> : (⊥ + Σ × (X + X))At → P(At·Σ ×(X+X)) to be grphX(θ) = {(αp, x) | θ(α) = (p, x)}. Given a skip-free automaton (X, h), we define grph<sup>∗</sup> (X, h) = (X, grph<sup>X</sup> ◦ h)

The function grph<sup>X</sup> is injective: as its name suggests, grphX(θ) is essentially the graph of θ when viewed as a partial function from At to Σ × (X + X). This implies that the transformation grph<sup>∗</sup> of skip-free automata into LTSs preserves and reflects bisimilarity.

Lemma 5.2. Let x, y ∈ X, and (X, h) be a skip-free automaton. Then x ↔ y in (X, h) if and only if x ↔ y in grph<sup>∗</sup> (X, h).

Leading up to the proof of Theorem 3.13, we also need to undo the effect of grph<sup>∗</sup> on skip-free automata with a transformation that takes every LTS of the form grph<sup>∗</sup> (X, h) to its underlying skip-free automaton (X, h).

The LTSs that can be written in the form grph<sup>∗</sup> (X, h) for some skip-free automaton (X, h) can be described as follows. Call a set U ∈ P(At·Σ ×(X+X)) graph-like if whenever (αp, x) ∈ U and (αq, y) ∈ U, then p = q and x = y. An LTS (S, t) is deterministic if t(s) is graph-like for every s ∈ S.

Lemma 5.3. An LTS (S, t) is deterministic if and only if (S, t) = grph<sup>∗</sup> (X, h) for some skip-free automaton (X, h).

Remark 5.4. As mentioned in Footnote 7, there is a coalgebraic outlook in many of the technical details in the present paper. For the interested reader, grph and func are actually natural transformations between the functors whose coalgebras correspond to skip-free automata and labelled transitions, and are furthermore inverse to one another. This implies that grph<sup>∗</sup> and func<sup>∗</sup> witness an isomorphism between the categories of skip-free automata and deterministic LTSs.

#### 5.2 Translating Syntax

We can mimic the transformation of skip-free automata into deterministic labelled transition systems and vice-versa by a pair of syntactic translations going back and forth between skip-free GKAT expressions and certain one-free star expressions. Similar to how only some labelled transition systems can be turned into skip-free automata, only some one-free star expressions have corresponding skip-free GKAT expressions—the deterministic ones.

The definition of deterministic expressions requires the following notation: given a test b ∈ BExp, we define b · r inductively on r ∈ StExp as follows:

$$b \cdot 0 = 0 \qquad b \cdot \alpha p = \begin{cases} \alpha p & \alpha \le b \\ 0 & \alpha \not\le b \end{cases} \qquad b \cdot (r\_1 + r\_2) = b \cdot r\_1 + b \cdot r\_2$$

$$b \cdot (r\_1 r\_2) = (b \cdot r\_1) r\_2 \qquad b \cdot (r\_1 \ast r\_2) = (b \cdot r\_1)(r\_1 \ast r\_2) + b \cdot r\_2$$

for any αp ∈ At · Σ and r1, r<sup>2</sup> ∈ StExp.

Definition 5.5. The set of deterministic one-free star expressions is the smallest subset Det ⊆ StExp such that 0 ∈ Det and αp ∈ Det for any α ∈ At and p ∈ Σ, and for any r1, r<sup>2</sup> ∈ Det, and b ∈ BExp, b·r1+¯b·r2, r1r2, and (b·r1)∗( ¯b·r2) ∈ Det. From GExp<sup>−</sup> to Det. We can now present the translations of skip-free expressions to deterministic one-free star expressions.

Definition 5.6. We define the translation function gtr : GExp<sup>−</sup> → Det by

$$\operatorname{gtr}(0) = 0 \qquad \operatorname{gtr}(p) = \sum\_{\alpha \in \mathsf{At}} \alpha p \qquad \operatorname{gtr}(e\_1 +\_b e\_2) = b \cdot \operatorname{gtr}(e\_1) + \bar{b} \cdot \operatorname{gtr}(e\_2)$$

$$\operatorname{gtr}(e\_1 \cdot e\_2) = \operatorname{gtr}(e\_1)\operatorname{gtr}(e\_2) \qquad \operatorname{gtr}(e\_1^{(b)} e\_2) = (b \cdot e\_1) \* (\bar{b} \cdot e\_2)$$

for any b ∈ BExp, p ∈ Σ, e1, e<sup>2</sup> ∈ GExp.

Remark 5.7. In Definition 5.6, we make use of a generalized sum P α∈At. Technically, this requires we fix an enumeration of At ahead of time, say At = {α1, . . . , αn}, at which point we can define P <sup>α</sup>∈At r<sup>α</sup> = rα<sup>1</sup> + · · · + rα<sup>n</sup> . Of course, + is commutative and associative up to ≡∗, so the actual ordering of this sum does not matter as far as equivalence is concerned.

The most prescient feature of this translation is that it respects bisimilarity.

Lemma 5.8. The graph of the translation function gtr is a bisimulation of labelled transition systems between grph<sup>∗</sup> (GExp<sup>−</sup>, ∂) and (StExp, τ ). Consequently, if e<sup>1</sup> ↔ e<sup>2</sup> in grph<sup>∗</sup> (GExp<sup>−</sup>, ∂), then gtr(e1) ↔ gtr(e2) in (StExp, τ ).

From Det to GExp<sup>−</sup>. We would now like to define a back translation function rtg : Det → GExp<sup>−</sup> by induction on its argument. Looking at Definition 5.5, one might be tempted to write rtg(b · r<sup>1</sup> + ¯b · r2) = rtg(r1) +<sup>b</sup> rtg(r2), but the fact of the matter is that it is possible for there to be distinct b, c ∈ BExp such that b · r<sup>1</sup> + ¯b · r<sup>2</sup> = c · r<sup>1</sup> + ¯c · r2, even when b and c have different atoms.

Definition 5.9. Say that r1, r<sup>2</sup> ∈ StExp are separated by b ∈ BExp if r<sup>1</sup> = b · r<sup>1</sup> and r<sup>2</sup> = ¯b · r2. If such a b exists we say that r<sup>1</sup> and r<sup>2</sup> are separated.

Another way to define Det is therefore to say that Det is the smallest subset of StExp containing 0 and At·Σ that is closed under sequential composition and closed under unions and stars of separated one-free star expressions.

Suppose r<sup>1</sup> and r<sup>2</sup> are separated by both b and c. Then one can prove that (b ∨ c)r<sup>1</sup> ≡<sup>∗</sup> br<sup>1</sup> + cr<sup>1</sup> ≡<sup>∗</sup> r<sup>1</sup> and (b ∨ c)r<sup>2</sup> = (¯b ∧ c¯)r<sup>2</sup> ≡<sup>∗</sup> ¯b(¯cr2) ≡<sup>∗</sup> r2, so r<sup>1</sup> and r<sup>2</sup> are separated by b ∨ c as well. Since there are only finitely many Boolean expressions up to equivalence, there is a maximal (weakest) test b(r1, r2) ∈ BExp such that r<sup>1</sup> and r<sup>2</sup> are separated by b(r1, r2).

Definition 5.10. The back translation rtg : Det → GExp<sup>−</sup> is defined by

$$\begin{aligned} \text{rtg}(0) &= 0 & \text{rtg}(\alpha p) &= p +\_{\alpha} 0 & \text{rtg}(r\_1 + r\_2) &= \text{rtg}(r\_1) +\_{b(r\_1, r\_2)} \text{rtg}(r\_2) \\ \text{rtg}(r\_1 r\_2) &= \text{rtg}(r\_1) \cdot \text{rtg}(r\_2) & \text{rtg}(r\_1 \ast r\_2) &= \text{rtg}(r\_1)^{(b(r\_1, r\_2))} \text{rtg}(r\_2) \end{aligned}$$

for any r1, r<sup>2</sup> ∈ StExp. In the union and star cases, we may use that r<sup>1</sup> and r<sup>2</sup> are separated (by definition of Det), so that b(r1, r2) is well-defined.

The most prescient property of rtg is that it preserves provable equivalence.

#### Lemma 5.11. Let r1, r<sup>2</sup> ∈ Det. If r<sup>1</sup> ≡<sup>∗</sup> r2, then rtg(r1) ≡† rtg(r2).

The last fact needed in the proof of completeness is that, up to provable equivalence, every skip-free GKAT expression is equivalent to its back-translation.

# Lemma 5.12. For any e ∈ GExp−, e ≡† rtg(gtr(e)).

We are now ready to prove Theorem 3.13, that provable bisimilarity is complete with respect to behavioural equivalence in skip-free GKAT.

Theorem 3.13 (Completeness I). If e<sup>1</sup> ↔ e2, then e<sup>1</sup> ≡† e2.

Proof. Let e1, e<sup>2</sup> ∈ GExp be a bisimilar pair of skip-free GKAT expressions. By Lemma 5.2, e<sup>1</sup> and e<sup>2</sup> are bisimilar in grph<sup>∗</sup> (GExp<sup>−</sup>, ∂). By Lemmas 4.3 and 5.8, the translation gtr : grph<sup>∗</sup> (GExp<sup>−</sup>, ∂) → (StExp, τ ) preserves bisimilarity, so gtr(e1) and gtr(e2) are bisimilar in (StExp, τ ) as well. By Theorem 4.4, gtr(e1) ≡<sup>∗</sup> gtr(e2). Therefore, by Lemma 5.11, rtg(gtr(e1)) ≡† rtg(gtr(e2)). Finally, by Lemma 5.12, we have e<sup>1</sup> ≡† rtg(gtr(e1)) ≡† rtg(gtr(e2)) ≡† e2.

# 6 Completeness for Skip-free GKAT

The previous section establishes that ≡†-equivalence coincides with bisimilarity for skip-free GKAT expressions by reducing the completeness problem of skipfree GKAT up to bisimilarity to a solved completeness problem, namely that of one-free star expressions up to bisimilarity. In this section we prove a completeness result for skip-free GKAT up to language equivalence. We show this can be achieved by reducing it to the completeness problem of skip-free GKAT up to bisimilarity, which we just solved in the previous section.

Despite bisimilarity being a less traditional equivalence in the context of Kleene algebra, this reduction simplifies the completeness proof greatly, and justifies the study of bisimilarity in the pursuit of completeness for GKAT.

The axiom x · 0 = 0 (which is the only difference between skip-free GKAT up to language equivalence and skip-free GKAT up to bisimilarity) indicates that the only semantic difference between bisimilarity and language equivalence in skipfree GKAT is early termination. This motivates our reduction to skip-free GKAT up to bisimilarity below, which involves reducing each skip-free expression to an expression representing only the successfully terminating branches of execution.

Now let us turn to the formal proof of Theorem 3.14, which says that if e, f ∈ GExp<sup>−</sup> are such that L(e) = L(f), then e ≡ f. In a nutshell, our strategy is to produce two terms bec, bfc ∈ GExp<sup>−</sup> such that e ≡ bec, f ≡ bfc and bec ↔ bfc in (GExp<sup>−</sup>, ∂). The latter property tells us that bec ≡† bfc by Theorem 3.13, which allows us to conclude e ≡ f. The expression bec can be thought of as the early termination version of e, obtained by pruning the branches of its execution that cannot end in successful termination.

To properly define the transformation b−c on expressions, we need the notion of a dead state in a skip-free automaton, analogous to a similar notion from [42].

Definition 6.1. Let (X, h) be a skip-free automaton. The set D(X, h) is the largest subset of X such for all x ∈ D(X, h) and α ∈ At, either h(x)(α) = ⊥ or h(x)(α) ∈ Σ × D(X, h). When x ∈ D(X, h), x is dead; otherwise, it is live.

In the sequel, we say e ∈ GExp<sup>−</sup> is dead when e is a dead state in (GExp−, ∂), i.e., when e ∈ D(GExp−, ∂). Whether e is dead can be determined by a simple depth-first search, since e can reach only finitely many expressions by ∂. The axioms of skip-free GKAT can also tell when a skip-free expression is dead.

Lemma 6.2. Let e ∈ GExp. If e is dead, then e ≡ 0.

We are now ready to define b−c, the transformation on expressions promised above. The intuition here is to prune the dead subterms of e by recursive descent; whenever we find a part that will inevitably lead to an expression that is never going to lead to acceptance, we set it to 0.

Definition 6.3. Let e ∈ GExp<sup>−</sup> and a ∈ BExp. In the sequel we use ae as a shorthand for e +<sup>a</sup> 0. We furthermore define bec inductively, as follows

$$\begin{aligned} \begin{bmatrix} 0 \end{bmatrix} &= 0 & \begin{bmatrix} p \end{bmatrix} &= p & \begin{bmatrix} e\_1 +\_b e\_2 \end{bmatrix} &= \begin{bmatrix} e\_1 \end{bmatrix} +\_b \begin{bmatrix} e\_2 \end{bmatrix} \\\ \begin{bmatrix} e\_1 \end{bmatrix} &= \begin{bmatrix} e\_1 \end{bmatrix} \begin{bmatrix} e\_2 \end{bmatrix} &= \begin{bmatrix} e\_1 \end{bmatrix} +\_b \begin{bmatrix} e\_2 \end{bmatrix} \\\ \begin{bmatrix} e\_1 \end{bmatrix} &= \begin{bmatrix} e\_1 \end{bmatrix} \begin{bmatrix} e\_2 \end{bmatrix} +\_b \begin{bmatrix} e\_1 \end{bmatrix} &= \begin{bmatrix} e\_2 \end{bmatrix} \begin{bmatrix} e\_1 \end{bmatrix} &= \begin{bmatrix} e\_2 \end{bmatrix} \begin{bmatrix} e\_1 \end{bmatrix} \\\ \begin{bmatrix} e\_1 \end{bmatrix} \begin{bmatrix} e\_2 \end{bmatrix} &= \begin{bmatrix} e\_1 \end{bmatrix} \begin{bmatrix} e\_2 \end{bmatrix} &\text{otherwise} \end{aligned}$$

The transformation defined above yields a term that is ≡-equivalent to e, provided that we include the early termination axiom e · 0 ≡ 0. The proof is a simple induction on e, using Lemma 6.2.

Lemma 6.4. For any e ∈ GExp<sup>−</sup>, e ≡ bec.

It remains to show that if L(e) = L(f), then bec and bfc are bisimilar. To this end, we need to relate the language semantics of e and f to their behaviour. As a first step, we note that behaviour that never leads to acceptance can be pruned from a skip-free automaton by removing transitions into dead states.

Definition 6.5. Let (X, h) be a skip-free automaton. Define bhc : X → GX by

$$(h \rfloor (x)(\alpha) = \begin{cases} \bot & h(x)(\alpha) = (p, x'), \ x' \text{ is } dead\\ h(x)(\alpha) & \text{otherwise} \end{cases}$$

Moreover, language equivalence of two states in a skip-free automaton implies bisimilarity of those states, but only in the pruned version of that skip-free automaton. The proof works by showing that the relation on X that connects states with the same language is, in fact, a bisimulation in (X, bhc).

Lemma 6.6. Let (X, h) be a skip-free automaton and x, y ∈ X. We have

$$\mathcal{L}(x,(X,h)) = \mathcal{L}(y,(X,h)) \implies x \rightleftarrows y \text{ in } (X,\lfloor h \rfloor)$$

The final intermediate property relates the behaviour of to states in the pruned skip-free automaton of expressions to the syntactic skip-free automaton.

Lemma 6.7. The graph {(e, bec) | e ∈ GExp<sup>−</sup>} of b−c is a bisimulation of skip-free automata between (GExp−, b∂c) and (GExp−, ∂).

We now have all the ingredients necessary to prove Theorem 3.14.

Theorem 3.14 (Completeness II). If e<sup>1</sup> ∼<sup>L</sup> e2, then e<sup>1</sup> ≡ e2.

Proof. If e<sup>1</sup> ∼<sup>L</sup> e2, then by definition L(e1) = L(e2). By Lemma 6.6, e<sup>1</sup> ↔ e<sup>2</sup> in (GExp<sup>−</sup>, b∂c), which by Lemma 6.7 implies that be1c ↔ be2c in (GExp<sup>−</sup>, ∂). From Theorem 3.13 we know that be1c ≡† be2c, and therefore e<sup>1</sup> ≡ e<sup>2</sup> by Lemma 6.4.

# 7 Relation to GKAT

So far we have seen the technical development of skip-free GKAT without much reference to the original development of GKAT as it was presented in [42] and [39]. In this section, we make the case that the semantics of skip-free GKAT is merely a simplified version of the semantics of GKAT, and that the two agree on which expressions are equivalent after embedding skip-free GKAT into GKAT. More precisely, we identify the bisimulation and language semantics of skip-free GKAT given in Section 3 with instances of the existing bisimulation [39] and language [42] semantics of GKAT proper. The main takeaway is that two skip-free GKAT expressions are equivalent in our semantics precisely when they are equivalent when interpreted as proper GKAT expressions in the existing semantics.

#### 7.1 Bisimulation semantics

To connect the bisimulation semantics of skip-free GKAT to GKAT at large, we start by recalling the latter. To do this, we need to define GKAT automata.

Definition 7.1. A (GKAT) automaton is a pair (X, d) such that X is a set and d : X → (⊥ + X + Σ × X) At is a function called the transition function. We write x α|p −−→ y to denote d(x)(α) = (p, y), x ⇒ α to denote d(x)(α) = X, and x ↓ α if d(x)(α) is undefined.

Automata can be equipped with their own notion of bisimulation.<sup>8</sup>

Definition 7.2. Given automata (X, h) and (Y, k), a bisimulation between them is a relation R ⊆ X × Y such that if x R y, α ∈ At and p ∈ Σ,:


<sup>8</sup> As in previous sections, automata can be studied as coalgebras for a given functor and the notions below are instances of general abstract notions [17,36].

$$\begin{array}{llll} \alpha \stackrel{\alpha}{\Rightarrow} b & \alpha \stackrel{\alpha}{\Rightarrow} b & e\_1 \Rightarrow \alpha & \alpha \\ \hline b \stackrel{\alpha}{\Rightarrow} a & e\_1 +\_b e\_2 \Rightarrow \alpha & \end{array} \quad \begin{array}{llll} \alpha \stackrel{\alpha}{\le} b & e\_2 \Rightarrow \alpha & \alpha \\ \hline e\_1 +\_b e\_2 \Rightarrow \alpha & \end{array} \quad \begin{array}{llll} \alpha \stackrel{\alpha}{\le} b & e\_1 \stackrel{\alpha|p}{\implies} e' & \alpha \stackrel{\alpha|p}{\implies} e' \\ \hline e\_1 +\_b e\_2 \stackrel{\alpha|p}{\xrightarrow{\alpha|p} e'} e' & \frac{\alpha \le \bar{b} \quad e \stackrel{\alpha|p}{\implies} e'}{e\_1 +\_b e\_2 \stackrel{\alpha|p}{\xrightarrow{\alpha|p} e'} e'} \\ \hline e\_1 \cdot e\_2 \stackrel{\alpha|p}{\xrightarrow{\alpha|p} e'} e' & \frac{\alpha \le \bar{b} \quad e \stackrel{\alpha|p}{\xrightarrow{\alpha|p} e'} e' \stackrel{\alpha}{\stackrel{\alpha}{\le} e\_2}}{e^{(b)} \stackrel{\alpha|p}{\xrightarrow{\alpha|p} e'} e'^{(b)}} \end{array}$$

$$\begin{array}{llll} \alpha \stackrel{\alpha}{\le} b & e \stackrel{\alpha|p}{\xrightarrow{\alpha|p} e'} e' & \frac{\alpha \le \bar{b}}{e^{(b)} \Rightarrow} \end{array}$$

Fig. 7. The transition function δ : GExp → (⊥ + X + Σ × GExp) At defined inductively. Here, <sup>e</sup><sup>1</sup> #e<sup>2</sup> is <sup>e</sup><sup>2</sup> when <sup>e</sup> = 1 and <sup>e</sup><sup>1</sup> ·e<sup>2</sup> otherwise, <sup>b</sup> <sup>∈</sup> BExp, <sup>p</sup> <sup>∈</sup> <sup>Σ</sup>, and e, e<sup>0</sup> , e<sup>i</sup> ∈ GExp.

We call x and y bisimilar and write x ↔ y if x R y for some bisimulation R.

Remark 7.3. The properties listed above are implications, but it is not hard to show that if all three properties hold for R, then so do all of their symmetric counterparts. For instance, if k(y)(α) = (p, y<sup>0</sup> ), then certainly h(x)(α) must be of the form (q, x<sup>0</sup> ), which then implies that q = p while x <sup>0</sup> R y<sup>0</sup> .

Two GKAT expressions are bisimilar when they are bisimilar as states in the syntactic automaton [39], (GExp, δ), summarised in Fig. 7.

Remark 7.4. The definition of δ given above diverges slightly from the definition in [39]. Fortunately, this does not make a difference in terms of the bisimulation semantics: two expressions are bisimilar in (GExp, δ) if and only if they are bisimilar in the original semantics. The full version [40] contains a detailed account.

There is a fairly easy way to convert a skip-free automaton into a GKAT automaton: simply reroute all accepting transitions into a new state >, that accepts immediately, and leave the other transitions the same.

Definition 7.5. Given a skip-free automaton (X, d), we define the automaton embed(X, d) = (X + >, ˜d), where ˜d is defined by

$$\tilde{d}(x)(\alpha) = \begin{cases} \check{\vee} & x = \top \\ (p, \top) & d(x)(\alpha) = (p, \check{\vee}) \\ d(x)(\alpha) & \text{otherwise} \end{cases}$$

We can show that two states are bisimilar in a skip-free automaton if and only if these same states are bisimilar in the corresponding GKAT automaton.

Lemma 7.6. Let (X, d) be a skip-free automaton, and let x, y ∈ X.

$$x \rightleftharpoons y \text{ in } (X, d) \iff x \rightleftharpoons y \text{ in } \mathsf{embed}(X, d);$$

The syntactic skip-free automaton (GExp<sup>−</sup>, ∂) can of course be converted to a GKAT automaton in this way. It turns out that there is a very natural way of correlating this automaton to the syntactic GKAT automaton (GExp, δ).

Lemma 7.7. The relation {(e, e) : e ∈ GExp<sup>−</sup>} ∪ {(>, 1)} is a bisimulation between embed(GExp−, ∂) and (GExp, δ).

We now have everything to relate the bisimulation semantics of skip-free GKAT expressions to the bisimulation semantics of GKAT expressions at large.

Lemma 7.8. Let e, f ∈ GExp−. The following holds:

$$(e \varleftrightarrow f \text{ in } (\mathsf{GExp}^-, \partial) \iff e \varleftrightarrow f \text{ in } (\mathsf{GExp}, \delta))$$

Proof. We derive using Lemmas 7.6 and 7.7, as follows: since the graph of embed is a bisimulation, e ↔ f in (GExp<sup>−</sup>, ∂) iff e ↔ f in embed(GExp<sup>−</sup>, ∂) if and only if e ↔ f in (GExp, δ). In the last step, we use the fact that if R is a bisimulation (of automata) between (X, h) and (Y, k), and S is a bisimulation between (Y, k) and (Z, `), then R ◦ S is a bisimulation between (X, h) and (Z, `).

#### 7.2 Language semantics

We now recall the language semantics of GKAT, which is defined in terms of guarded strings [28], i.e., words in the set At·(Σ ·At) ∗ , where atoms and actions alternate. In GKAT, successful termination occurs with a trailing associated test, representing the state of the machine at termination. In an execution of the sequential composition of two programs e · f, the test trailing the execution of e needs to match up with an input test compatible with f, otherwise the program crashes at the end of executing e. The following operations on languages of guarded strings record this behaviour by matching the ends of traces on the left with the beginnings of traces on the right.

Definition 7.9. For L, K ⊆ At·(Σ ·At) ∗ , define L K = {wαx : wα ∈ L, αx ∈ K} and L (∗) = S <sup>n</sup>∈<sup>N</sup> L (n) , where L (n) is defined inductively by setting L (0) = At and L (n+1) = L L (n) .

The language semantics of a GKAT expression is now defined in terms of the composition operators above, as follows.

Definition 7.10. We define <sup>L</sup><sup>b</sup> : GExp → P(At·(Σ·At) ∗ ) inductively, as follows:

<sup>L</sup>b(b) = {<sup>α</sup> <sup>∈</sup> At <sup>|</sup> <sup>α</sup> <sup>≤</sup> <sup>b</sup>} <sup>L</sup>b(p) = {αpβ <sup>|</sup> α, β <sup>∈</sup> At} <sup>L</sup>b(<sup>e</sup> · <sup>f</sup>) = <sup>L</sup>b(e) <sup>L</sup>b(f) <sup>L</sup>b(<sup>e</sup> <sup>+</sup><sup>b</sup> <sup>f</sup>) = <sup>L</sup>b(b) <sup>L</sup>b(e) <sup>∪</sup> <sup>L</sup>b(b) <sup>L</sup>b(f) <sup>L</sup>b(<sup>e</sup> (b) ) = (Lb(b) <sup>L</sup>b(e))(∗) <sup>L</sup>b(b)

This semantics is connected to the relational semantics from Definition 2.2:

Theorem 7.11 ([42]). For e, f <sup>∈</sup> GExp, we have <sup>L</sup>b(e) = <sup>L</sup>b(f) if and only if <sup>J</sup>eK<sup>σ</sup> <sup>=</sup> <sup>J</sup>fK<sup>σ</sup> for all relational interpretations <sup>σ</sup>

Moreover, since skip-free GKAT expressions are also GKAT expressions, this means that we now have two language interpretations of the former, given by <sup>L</sup><sup>b</sup> and L. Fortunately, one can easily be expressed in terms of the other.


Fig. 8. Axioms for language semantics GKAT (without the Boolean algebra axioms for tests). The function E : GExp → BExp is defined below. If the axiom marked (†) is omitted, the above potentially axiomatizes bisimilarity.

Lemma 7.12. For <sup>e</sup> <sup>∈</sup> GExp<sup>−</sup>, it holds that <sup>L</sup>b(e) = <sup>L</sup>(e) · At.

As an easy consequence of the above, we find that the two semantics must identify the same skip-free GKAT-expressions.

Lemma 7.13. For e, f <sup>∈</sup> GExp<sup>−</sup>, we have <sup>L</sup>(e) = <sup>L</sup>(f) iff <sup>L</sup>b(e) = <sup>L</sup>b(f).

By Theorem 3.14, these properties imply that ≡ also axiomatizes relational equivalence of skip-free GKAT-expressions, as a result.

Corollary 7.14. Let e, f <sup>∈</sup> GExp<sup>−</sup>, we have <sup>e</sup> <sup>≡</sup> <sup>f</sup> if and only if <sup>J</sup>eK<sup>σ</sup> <sup>=</sup> <sup>J</sup>fK<sup>σ</sup> for all relational interpretations σ.

#### 7.3 Equivalences

Finally, we relate equivalences as proved for skip-free GKAT expressions to those provable for GKAT expressions, showing that proofs of equivalence for skip-free GKAT expressions can be replayed in the larger calculus, without (UA).

The axioms of GKAT as presented in [42,39] are provided in Figure 8. We write e ≈† f when e = f is derivable from the axioms in Figure 8 with the exception of (†), and e ≈ f when e = f is derivable from the full set.

The last axiom of GKAT is not really a single axiom, but rather an axiom scheme, parameterized by the function E : GExp → BExp defined as follows:

$$E(b) = b \qquad E(p) = 0 \qquad E(e +\_b f) = (b \wedge E(e)) \vee (\overline{b} \wedge E(f))$$

$$E(e \cdot f) = E(e) \wedge E(f) \qquad E(e^{(b)}) = \overline{b}$$

The function E models the analogue of Salomaa's empty word property [37]: we say e is guarded when E(b) is equivalent to 0 by to the laws of Boolean algebra. Notice that as GKAT expressions, skip-free GKAT expressions are always guarded.

Since skip-free GKAT expressions are also GKAT expressions, we have four notions of equivalence for GKAT expressions: as skip-free expressions or GKAT expressions in general, either with or without (†). These are related as follows.

Theorem 7.15. Let e, f ∈ GExp<sup>−</sup>. Then (1) e ≈† f if and only if e ≡† f, and (2) e ≈ f if and only if e ≡ f.

Proof. For the forward direction of (1), we note that if e ≈† f, then e ↔ f in (GExp, δ) by Theorem 3.12. By Lemma 7.8, e ↔ f in (GExp−, δ) and therefore e ≡† f by Theorem 3.13. Conversely, note that any proof of e = f by the axioms of Figure 2 can be replayed using the rules from Figure 8. In particular, the guardedness condition required for the last skip-free GKAT axiom using the last GKAT axiom is always true, because E(g) ≈† 0 for any g ∈ GExp−.

The proof of the second claim is similar, but uses Theorem 3.13 instead.

# 8 Related Work

This paper fits into a larger research program focused on understanding the logical and algebraic content of programming. Kleene's paper introducing the algebra of regular languages [22] was a foundational contribution to this research program, containing an algebraic account of mechanical programming and some of its sound equational laws. The paper also contained an interesting completeness problem: give a complete description of the equations satisfied by the algebra of regular languages. Salomaa was the first to provide a sound and complete axiomatization of language equivalence for regular expressions [37].

The axiomatization in op. cit. included an inference rule with a side condition that prevented it from being algebraic in the sense that the validity of an equation is not preserved when substituting letters for arbitrary regular expressions. Nevertheless, this inspired axiomatizations of several variations and extensions of Kleene algebra [46,42,41], as well as Milner's axiomatization of the algebra of star behaviours [32]. The side condition introduced by Salomaa is often called the empty word property, an early version of a concept from process theory called guardedness<sup>9</sup> that is also fundamental to the theory of iteration [6].

Our axiomatization of skip-free GKAT is algebraic due to the lack of a guardedness side-condition (it is an equational Horn theory [31]). This is particularly desirable because it allows for an abundance of other models of the axioms. Kozen proposed an algebraic axiomatization of Kleene algebra that is sound and complete for language equivalence [24], which has become the basis for a number of axiomatizations of other Kleene algebra variants [13,19,20,47] including Kleene algebra with tests [25]. KAT also has a plethora of relational models, which are desirable for reasons we hinted at in Section 2.

GKAT is a fragment of KAT that was first identified in [29]. It was later given a sound and complete axiomatization in [42], although the axiomatization is neither algebraic nor finite (it includes (UA), an axiom scheme that stands for infinitely many axioms). It was later shown that dropping x · 0 = 0 (called (S3) in [42]) from this axiomatization gives a sound and complete axiomatization of bisimilarity [39]. The inspiration for our pruning technique is also in [39], where a reduction of the language equivalence case to the bisimilarity case is discussed.

<sup>9</sup> This is a different use of the word "guarded" than in "guarded Kleene algebra with tests". In the context of process theory, a recursive specification is guarded if every of its function calls occurs within the scope of an operation.

Despite the existence of an algebraic axiomatization of language equivalence in KAT, GKAT has resisted algebraic axiomatization so far. Skip-free GKAT happens to be a fragment of GKAT in which every expression is guarded, thus eliminating the need for the side condition in Fig. 8 and allowing for an algebraic axiomatization. An inequational axiomatization resembling that of KAT might be gleaned from the recent preprint [38], but we have not investigated this carefully. The GKAT axioms for bisimilarity of ground terms can also likely be obtained from the small-step semantics of GKAT using [1,2,3], but unfortunately this does not appear to help with the larger completeness problem.

The idea of reducing one completeness problem in Kleene algebra to another is common in Kleene algebra; for instance, it is behind the completeness proof of KAT [28]. Cohen also reduced weak Kleene algebra as an axiomatization of star expressions up to simulation to monodic trees [10], whose completeness was conjectured by Takai and Furusawa [45]. Grabmayer's solution to the completeness problem of regular expressions modulo bisimulation [14] can also be seen as a reduction to the one-free case [15], since his crystallization procedure produces an automaton that can be solved using the technique found in op. cit. Other instances of reductions include [9,4,11,47,19,21,30,34,26]. Recent work has started to study reductions and their compositionality properties [11,20,33].

# 9 Discussion

We continue the study of efficient fragments of Kleene Algebra with Tests (KAT) initiated in [42], where the authors introduced Guarded Kleene Algebra with Tests (GKAT) and provided an efficient decision procedure for equivalence. They also proposed a candidate axiomatization, but left open two questions.


In this paper, we identified a large fragment of GKAT, which we call skip-free GKAT (GKAT<sup>−</sup>), that can be axiomatized algebraically without relying on an axiom scheme. We show how the axiomatization works well for two types of equivalence: bisimilarity and language equivalence, by proving completeness results for both semantics. Having the two semantics is interesting from a verification point of view as it gives access to different levels of precision when analyzing program behaviour, but also enables a layered approach to the completeness proofs.

We provide a reduction of the completeness proof for language semantics to the one for bisimilarity. Moreover, the latter is connected to a recently solved [14] problem proposed by Milner. This approach enabled two things: it breaks down the completeness proofs and reuses some of the techniques while also highlighting the exact difference between the two equivalences (captured by the axiom e·0 ≡ 0 which does not hold for bisimilarity). We also showed that proofs of equivalence in skip-free GKAT transfer without any loss to proofs of equivalence in GKAT.

There are several directions for future work. The bridge between process algebra and Kleene algebra has not been exploited to its full potential. The fact that we could reuse results by Grabmayer and Fokkink [14,15] was a major step towards completeness. An independent proof would have been much more complex and very likely required the development of technical tools resembling those in [14,15]. We hope the results in this paper can be taken further and more results can be exchanged between the two communities to solve open problems.

The completeness problem for full GKAT remains open, but our completeness results for skip-free GKAT are encouraging. We believe they show a path towards studying whether an algebraic axiomatization can be devised or a negative result can be proved. A first step in exploring a completeness result would be to try extending Grabmayer's completeness result [14] to a setting with output variables—this is a non-trivial exploration, but we are hopeful will yield new tools for completeness. As mentioned in the introduction, NetKAT [4] (and its probabilistic variants [12,43]) have been one of the most successful extensions of KAT. We believe the step from skip-free GKAT to a skip-free guarded version of NetKAT is also a worthwhile exploration. Following [16], we hope to be able to explore these extensions in a modular and parametric way.

Acknowledgements A. Silva and T. Schmid were partially funded by ERC grant Autoprobe (grant agreement 101002697). T. Kapp´e was supported by the EU's Horizon 2020 research and innovation programme under Marie Sk lodowska-Curie grant agreement No. 101027412 (VERLAN).

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Quorum Tree Abstractions of Consensus Protocols

Berk Cirisci1() , Constantin Enea<sup>2</sup> , and Suha Orhun Mutluergil<sup>3</sup>

> 1 IRIF, Université Paris Cité, Paris, France cirisci@irif.fr

<sup>2</sup> LIX, Ecole Polytechnique, CNRS and Institut Polytechnique de Paris, Palaiseau,

France

cenea@lix.polytechnique.fr <sup>3</sup> Sabanci University, Istanbul, Turkey suha.mutluergil@sabanciuniv.edu

Abstract. Distributed algorithms solving agreement problems like consensus or state machine replication are essential components of modern fault-tolerant distributed services. They are also notoriously hard to understand and reason about. Their complexity stems from the different assumptions on the environment they operate with, i.e., process or network link failures, Byzantine failures etc. In this paper, we propose a novel abstract representation of the dynamics of such protocols which focuses on quorums of responses (votes) to a request (proposal) that form during a run of the protocol. We show that focusing on such quorums, a run of a protocol can be viewed as working over a tree structure where different branches represent different possible outcomes of the protocol, the goal being to stabilize on the choice of a fixed branch. This abstraction resembles the description of recent protocols used in Blockchain infrastructures, e.g., the protocol supporting Bitcoin or Hotstuff. We show that this abstraction supports reasoning about the safety of various algorithms, e.g., Paxos, PBFT, Raft, and HotStuff, in a uniform way. In general, it provides a novel induction based argument for proving that such protocols are safe.

# 1 Introduction

Consensus or state-machine replication protocols are essential ingredients for maintaining strong consistency in modern fault-tolerant distributed systems. Such protocols must execute in the presence of concurrent and asynchronous message exchanges as well as benign (message loss, process crash) or Byzantine failures (message corruption). Developing practical implementations or reasoning about their correctness is notoriously difficult. Standard examples include the classic Paxos [21] or PBFT [5] protocols, or the more recent HotStuff [37] protocol used in Blockchain infrastructures.

In this paper, we propose a new abstraction for representing the executions of such protocols that can be used in particular, to reason about their safety, i.e., ensuring Agreement (e.g., all correct processes decide on a single value) and Validity (e.g., the decided value has been proposed by some node participating in the protocol). Usually, protocol executions are composed of a number of communication-closed rounds [11], and each round consists of several phases in which a process broadcasts a request and expects to collect responses from a quorum of processes before advancing to the next phase. The abstraction is defined as a sequential object called Quorum Tree (QTree) which maintains a tree structure where each node corresponds to a different round in an execution. The operations of QTree, to add or change the status of a node, model quorums of responses that have been received in certain phases of a round.

For instance, a round in single-decree Paxos consists of two phases: a prepare phase where a pre-determined leader broadcasts a request for joining that round and expects a quorum of responses from the other processes before advancing to a vote phase where it broadcasts a value to agree upon and expects a quorum of responses (votes) in order to declare that value as decided in that round. Rounds are initiated by their respective leaders and can run concurrently. The idea behind QTree is to represent a Paxos execution using a rooted tree where each node different from the root corresponds to a round where the leader has received a quorum of responses in the prepare phase. The parent-child relation models the data flow from one round to a later round: responses to join requests contain values voted for in previous rounds (if any) and one of them will be included by the leader in the vote phase request. The round in which that value was voted defines the parent. Then, each node has one out of three possible statuses: ADDED if the vote phase can still be successful (the leader can collect a quorum of votes) but this did not happen yet, GHOST if the vote phase can not be successful (e.g., a majority of processes advanced to the next round without voting), and COMMITTED if the leader has received a quorum of responses in the vote phase. This is a tree structure because before reaching a quorum in the vote phase of a round, other rounds can start and their respective leaders can send other vote requests (with possibly different values). The specific construction of requests and responses in Paxos ensures that all the COMMITTED nodes in this tree belong to a single branch, which entails the agreement property (this will become clearer when presenting the precise definition of QTree in Section 2).

The QTree abstraction is applicable to a wide range of protocols beyond the single-decree Paxos sketched above. It applies to state-machine replication protocols like Raft [36] and HotStuff [37] where the tree structure represents logs of commands (inputted by clients) stored at different processes and organized according to common prefixes (each node corresponds to a single command) and multi-decree consensus protocols like multi-Paxos [21] and its variants [16,26,23,18], or PBFT [5] where different consensus instances (for different indices in a sequence of commands) are modeled using different QTree instances.

We show that all these protocols are refinements of QTree in the sense that their executions can be mapped to sequences of operations on a QTree state, which are about agreeing on a branch of the tree called the trunk. These operations are defined as invocations of two methods add and commit for adding a new leaf to the tree (during which some other nodes may turn to GHOST) and changing the status of a node from ADDED to COMMITTED, respectively. Any sequence of invocations to these methods ensures that all the COMMITTED nodes lie on the same branch of the tree (the trunk). In relation to protocol executions, add and commit invocations that concern the same node correspond to receiving a quorum of responses in two specific phases of a round, which vary from one protocol to another.

The mapping between protocol executions and QTree executions is defined as in proofs of linearizability for concurrent objects with fixed linearization points. Analogous to linearizability, where the goal is to show that an object method takes effect instantaneously at a point in time called linearization point, we show that it is possible to mark certain steps of a given protocol as linearization points of add or commit operations<sup>4</sup> , such that the sequence of add and commit invocations defined by the order between linearization points along a protocol execution is a correct QTree execution. We introduce a declarative characterization of correct QTree executions that simplifies the proof of the latter (see Section 3).

The QTree abstraction offers a novel view on the dynamics of classic consensus or state-machine replication protocols like Paxos, Raft, and PBFT, which relates to the description of recent Blockchain protocols like HotStuff and Bitcoin [27], i.e., agreeing on a branch in a tree. It provides a formal framework to reason uniformly about single-decree consensus protocols and state-machine replication protocols like Raft and HotStuff. For single-decree protocols (or compositions thereof), the parent-child relation between QTree nodes corresponds to the data-flow between a quorum of responses to a leader and the request he sends in the next phase while for Raft and HotStuff, it corresponds to an order set by a leader between different commands.

Our work relies on a hypothesis that correctness proofs based on establishing a refinement towards an operational specification such as QTree, which can be understood as a sequence of steps, are much more intuitive and "explainable" compared to classic proofs based on inductive invariants. An inductive invariant has to describe all intermediate states produced by all possible orders of receiving messages and a precise formalization is quite complex. As an indication, the Paxos invariant used in recent work [29] (see formulas (4) to (12) in Section 5.2) is a conjunction of eight quantified first-order formulas which are hard to reason about and not re-usable in the context of a different protocol.

We believe that operational specifications are also helpful in taming complexity while designing new protocols or implementations theoreof, or in gaining confidence about their correctness without going through ad-hoc and brittle proof arguments. For instance, our proofs are very clear about the phases of a round in which quorums need to intersect, which provides flexibility and opti-

<sup>4</sup> These linearization points are fixed in the sense that they correspond to specific instructions in the code of the protocol, and they do not depend on the future of an execution. For an expert reader, this actually corresponds to a proof of strong linearizability [15].

mization opportunities for deciding on quorum sizes in each phase. Depending on environment assumptions, quorum sizes can be optimized while preserving correctness. Compared to previous operational specifications for reasoning about consensus protocols, e.g., [3,12], QTree is designed to be less abstract so that the refinement proof, establishing the relationship between a given protocol and QTree, is less complex (see Section 8 for details).

# 2 Quorum Tree

We describe the QTree sequential object which operates on a tree and has two methods add and commit for adding a new node and modifying an attribute of a node (committing a node), respectively. When used as an abstraction of consensus protocols, invocations of these two methods correspond to certain quorums that are reached during a round of the protocol.

# 2.1 Overview

QTree is a sequential rooted-tree, a possible state being depicted in Figure 1. The nodes with black dashed margins are not members of the tree and they are discussed later. Each node in the tree contains a round number, a value, and a status field set to ADDED, GHOST, or COMMITTED. The round number acts as an identifier of a node since there can not exist two nodes with the same round number. The Root node is part of the initial state and its status is COMMITTED. A QTree state consists of a trunk, alive branches, and dead branches; a branch is a chain of nodes connected by the parent relation. Alive branches are extensible with new ADDED nodes but dead branches are not. The trunk is a particular branch of the tree that starts from the root. It contains all the COMMITTED nodes and it ends with a COMMITTED node. It may also contain ADDED or GHOST nodes. For example, in Figure 1, the trunk consists of Root and n3. All alive branches are connected to the last COMMITTED node of the trunk (alive branches can include ADDED or GHOST nodes). For instance, in Figure 1, the subtree rooted at n<sup>3</sup> contains a single alive branch whose leaf node is n5. Dead branches can contain only GHOST nodes. In Figure 1, the tree contains a single dead branch containing the node n1.

Nodes can be added to the tree as leaves. The status of a newly added node is either ADDED or GHOST. The status ADDED may turn to GHOST or COMMITTED. The GHOST status is "final" meaning that it can never turn into COMMITTED afterwards. However, GHOST nodes can be part of alive branches, and they can help in growing the tree.

QTree has two methods add and commit:

– add generates a new leaf with a round number r value v and parent p identified by the round number r<sup>p</sup> given as an input. Its status is set to ADDED or GHOST provided that some conditions hold. If the status of the new node is set as ADDED, then it either extends (has a path to the end of) an existing alive branch or creates a new alive branch from the trunk. The new node may also "invalidate" some other nodes by changing their status from ADDED to GHOST.

– commit extends the trunk by turning the status of a node from ADDED to COMMITTED. This extension of the trunk may prevent some branches to be extended in the future (some alive branches may become dead), i.e., future invocations of add that extend those branches will add only GHOST nodes.

Each node models the evolution of a round in a consensus protocol and the value attribute represents the value proposed by the leader of that round. The round and value attributes of a node are immutable and cannot be changed later. We assume that round numbers are strictly positive except for Root whose round number is 0.

QTree applies uniformly to a range of consensus or state-machine replication protocols. We start by describing a variation that applies to single-decree consensus protocols, where a number of processes aim to agree on a single value. Multi-decree consensus protocols that are used to solve state-machine replication can be simulated using a number of instances of QTree, one for each decree (the instances are independent one from another). Then, state-machine replication protocols like HotStuff that rely directly on a tree structure to order commands can be simulated by the QTree for single-decree consensus modulo a small change that we discuss later.

#### 2.2 Definition of the Single-Decree Version

Algorithm 1 lists a description of QTree in pseudo-code. The following set of predicates are used as conditions inside methods:


The add method (lines 5-17) generates a new node n with round, value, and parent set according to the method's inputs. Then, it adds n to the tree by linking it to the selected parent if n satisfies the following validity conditions:


# Algorithm 1: The QTree object

```
1 Initialize:
     /* ⊥ denotes non-initialized values */
2 Root.round = 0; Root.status = COMMITTED;
3 Root.value = ⊥; Root.parent = Root;
4 Nodes = {Root};
5 Method add (r, v, rp)
6 Pre: r > 0
7 n = new Node(round = r, status = ⊥,
      value = v, parent = p : p.round = rp);
8 if valid(n) ∧ valueConstraint(n)
 9 Nodes = Nodes ∪ {n};
10 n.status = ADDED;
11 if ∃n
             0 ∈ Nodes. n0
                        .round > n.round
12 n.status = GHOST;
13 forall n
               0 ∈ Nodes. n0
                          .round < n.round
14 if n is conflicting with n0
15 n
                0
                .status ← GHOST;
16 return OK
17 return FAIL
18 Method commit (r)
19 if ∃ n ∈ Nodes. n.round = r ∧
      n.status = ADDED
20 n.status ← COMMITTED;
21 return OK
22 return FAIL
```
Fig. 1: A state of QTree. We represent ADDED nodes with green solid margins, GHOST nodes with red doubleline margins, and COMMITTED nodes with blue thick margins. The nodes with black dashed margins are not part of the state, they are fictitious nodes used to explain the method for adding new nodes.

The valid predicate at (5) is the conjunction of the first three constraints.

For example, let us consider an invocation of add in a state of QTree that contains the non-dashed nodes in Figure 1. If the invocation generates n2, n4, or n<sup>6</sup> (receiving as input the corresponding attributes), then n<sup>2</sup> and n<sup>6</sup> do satisfy all these constraints and can be added to the tree. The node n<sup>4</sup> fails the extendsTrunk predicate because it is not extending the last node of the trunk (n3) and its round number is higher.

If a node n satisfies the conditions above, the add method turns its status to either ADDED or GHOST. If there is another node in the tree with a higher round number, n's status becomes GHOST. Otherwise, it becomes ADDED. As a continuation of the example above, the status of n<sup>2</sup> is set to GHOST because the tree contains node n<sup>3</sup> with a higher round number and the status of n<sup>6</sup> is set to ADDED.

Moreover, the addition of n can "invalidate" some other nodes, turn their status to GHOST. This is based on a notion of conflicting nodes. We say that two nodes are conflicting if they are on different branches, i.e., there is no path from one node to the other. An add invocation that adds a node n changes the

Fig. 2: Explaining the behavior of add and commit methods. Colors are interpreted as in Fig 1.

status of all the nodes n 0 in the tree that conflict with n and have a lower round number than n, to GHOST. For example, Figure 2 pictures a sequence of QTree states in an execution, to be read from left to right. The first state represents the result of executing add(1, v1, 0) on the initial state of QTree, adding node n1. Executing add(3, v2, 0) on this first state creates another node n<sup>3</sup> and sets its status to ADDED. This invocation will also turn the status of n<sup>1</sup> to GHOST since its round number is less than the round number of n<sup>3</sup> and they are on different branches. Afterwards, by executing add(2, v1, 1), a node n<sup>2</sup> is added to the tree with status GHOST since there is a node n<sup>3</sup> on a different branch which has a higher round number.

The method add returns OK when the created node is effectively added to the tree (it satisfies the conditions described above) and F AIL, otherwise.

Lastly, the commit method takes a round number r as input and turns the status of the node containing r to COMMITTED if it was ADDED. If successful, it returns OK and F AIL, otherwise. As a continuation of the example above, the right part of Figure 2 pictures a state obtained by executing commit(3) on the state to the left. This sets the status of n<sup>3</sup> to COMMITTED as n<sup>3</sup> was previously ADDED. Note that the conditions in add ensure that the tree can not contain two nodes with the same round number.

Safety Properties. We show that the QTree object in Algorithm 1 can be used to reason about the safety of single-decree consensus protocols, in the sense that it satisfies a notion of Validity (processes agree on one of the proposed values) and Agreement (processes decide on a single value). More precisely, we show that every state that is reachable by executing a sequence of invocations of add and commit (in Algorithm 1), called simply reachable state, satisfies the following:


Proposition 1 (Validity). Every node in a reachable state that is different from Root contains the same value as a child of Root.

Proof. A node n is added to the tree only if the predicate valueConstraint holds, which implies that it is either a child of Root or it has the same value as its parent which is a descendant of Root. Also, since the value attribute of a node is immutable, any COMMITTED node contains the same value that it had when it was created by an add invocation.

Therefore, the fact that a consensus protocol refining QTree satisfies validity, i.e., processes decide on a value proposed by a client of the protocol, reduces to proving that the phases of a round simulated by add invocations that add children of Root use values proposed by a client. This is ensured using additional mechanisms, i.e., a client broadcasts its value to all participants in the protocol, so that each participant can check the validity of a value proposed by a leader.

Next, we focus on Agreement, and show that COMMITTED nodes belong to a single branch of the tree.

Proposition 2. Let n<sup>1</sup> and n<sup>2</sup> be two COMMITTED nodes in a reachable state. Then, n<sup>1</sup> and n<sup>2</sup> are not conflicting.

Proof. Assume towards contradiction that QTree reaches a state where two COMMITTED nodes n<sup>1</sup> and n<sup>2</sup> are conflicting. Let r<sup>1</sup> = n1.round and r<sup>2</sup> = n2.round. Without loss of generality, we assume that r<sup>1</sup> < r2. Such a state is reachable if add(r1, \_, \_) and add(r2, \_, \_) resulted in adding the nodes n<sup>1</sup> and n<sup>2</sup> and set their status to ADDED (we use \_ to denote arbitrary values), and subsequently, commit(r1) and commit(r2) switched the status of both n<sup>1</sup> and n<sup>2</sup> to COMMITTED. If add(r1, \_, \_) were to execute before add(r2, \_, \_), then add(r2, \_, \_) would have changed the status of n<sup>1</sup> to GHOST because it is conflicting with n2. Otherwise, if add(r2, \_, \_) were to execute before add(r1, \_, \_) , then the latter would have set the status of n<sup>1</sup> to GHOST since the tree contains n<sup>2</sup> that has a higher round number. In both cases, executing commit(r1) can never turn the status of n<sup>1</sup> to COMMITTED.

Proposition 2 allows to conclude that any two COMMITTED nodes (different from Root) contain the same value. Indeed, a node can become COMMITTED only if it was ADDED, which implies that is has the same value as its parent (the predicate valueConstraint holds), and by transitivity, as any of its ancestors, except for Root.

Proposition 3 (Agreement). Let n<sup>1</sup> and n<sup>2</sup> be two COMMITTED nodes in a reachable state, which are different from Root. Then, n1.value = n2.value.

#### 2.3 State Machine Replication Versions

The single-decree version described above can be extended easily to a multidecree context. As multi-decree consensus protocols, used in state machine replication, can be seen as a composition of multiple instances of single-decree consensus protocols, a multi-decree version of QTree is obtained by composing multiple instances of the single-decree version. Each of these instances manipulates a tree as described above without interference from other instances. The validity and agreement properties above apply separately to each instance.

The single-decree version can also be extended for state machine replication protocols like HotStuff and Raft where the commands (values) are a-priori structured as a tree, i.e., each command given as input is associated to a predetermined parent in this tree. Then, the goal of such a protocol is to agree on a sequence in which to execute these commands, i.e., a branch in this tree. Simply removing the valueConstraint condition in the add method (underlined in Algorithm 1) enables QTree to simulate such protocols. A node's value need not be the same as its parent's value to be valid for add. Proposition 2 that implies the agreement property of such protocols still holds (Proposition 3 does not hold when the valueConstraint condition is removed; this property is specific to single-decree consensus). Since the value field remains immutable, the validity property of such protocols reduces to ensuring that the values generated during phases simulated by add correspond to commands issued by the client (Proposition 1 is also specific to single-decree consensus and it does not hold). As before, this requires additional mechanisms, i.e., a client broadcasting a command to all the participants in the protocol, whose correctness can be established quite easily.

# 3 Consensus Protocols Refining QTree

In the following, we show that a number of consensus protocols are refinements of QTree in the sense that their executions can be mimicked with add and commit invocations. This is similar to a linearizable concurrent object being mimicked with invocations of a sequential specification. The refinement relation allows to conclude that the Validity and Agreement properties of QTree imply similar properties for any of its refinements.

The definition of the refinement relation relies on a formalization of protocols and QTree as labeled transition systems. For a given protocol, a state is a tuple of process local states and a set of messages in transit, and a transition corresponds to an indivisible step of a process (receiving a set of messages, performing a local computation step, or sending a message). For QTree, a state is a tree of nodes as described above and a step corresponds to an invocation to add or commit. An execution is a sequence of transitions from the initial state.

Refinement corresponds to a mapping between protocol executions and QTree executions. This mapping is defined as in proofs of linearizability for concurrent objects with fixed linearization points, where the goal is to show that each concurrent object method appears to take effect instantaneously at a point in time that corresponds to executing a fixed statement in its code. Therefore, certain steps of a given protocol are considered as linearization points of add and commit QTree invocations (returning OK), and one needs to prove that the sequence of invocations defined by the order of linearization points in a protocol execution is a correct execution of QTree.

Formally, a labeled transition system (LTS) is a tuple L = (Q, q0, T , AL) where Q is a set of states, q<sup>0</sup> is the unique initial state, A<sup>L</sup> is a set of actions (transition labels) and <sup>T</sup> is a set of transitions (q, a, q<sup>0</sup> ) such that q, q<sup>0</sup> ∈ Q and a ∈ AL. An execution E from q<sup>0</sup> is a finite sequence of alternating states and actions such that E = q0, a0, q1, a1, . . . , q<sup>n</sup> with (q<sup>i</sup> , a<sup>i</sup> , qi+1) ∈ T for each 0 ≤ i ≤ n−1. A trace t is the sequence of actions projected from some execution E. T(L) denotes the set of traces of L.

The standard notion of refinement between LTSs states that an LTS L is a refinement of another LTS L <sup>0</sup> when <sup>T</sup>(L) <sup>⊆</sup> <sup>T</sup>(<sup>L</sup> 0 ). In this paper, we consider a slight variation of this definition of refinement that applies to LTSs that do not share the same set of actions, representing for instance, some concrete protocol and QTree, respectively. This notion of refinement is parametrized by a mapping Γ between actions of L and L 0 , respectively. We say that L Γ-refines L <sup>0</sup> when Γ(T(L)) ⊆ T(L 0 ). Here, a mapping Γ : A<sup>L</sup> → AL<sup>0</sup> is extended to sequences and sets of sequences as expected, e.g., Γ(a<sup>1</sup> . . . an) = Γ(a1). . . Γ(an). With this extension, the preservation of safety specifications from an LTS to a refinement of it requires certain constraints on the mapping Γ that will be discussed in Section 4.2.

In the context of proving that a concrete protocol refines QTree, the goal is to define a mapping Γ between actions of the protocol and QTree add/commit invocations such that Γ applied to protocol executions results in correct QTree executions. In the following, we provide a characterization of correct QTree executions that simplifies such refinement proofs.

#### 3.1 Characterizing QTree Invocation Sequences

An invocation label add(r, v, rp) ⇒ RET or commit(r) ⇒ RET combines a QTree method name with input values and a return value RET ∈ {OK, F AIL}. An invocation label is called successful when the return value is OK. A sequence σ of invocation labels is called correct when there exist QTree states q0, . . ., q<sup>|</sup>σ<sup>|</sup> , such that q<sup>0</sup> is the QTree initial state and for each i ∈ [1, |σ|], executing σ<sup>i</sup> starting from qi−<sup>1</sup> leads to q<sup>i</sup> .

Theorem 1. A sequence σ of successful invocation labels is correct if and only if the following hold (we use \_ to denote arbitrary values):


Properties 1–3 are straightforward consequences of the add and commit definitions. Indeed, it is impossible to add two nodes with the same round number r, which implies that there can not be two successful add(r, \_, \_) invocations, the status of a node can be flipped to COMMITTED exactly once, which implies that there can not be two successful commit(r) invocations, and a commit(r) is successful only if a node with round number r already exists, hence Property 2 must hold. Moreover, a node's parent defined by the input r<sup>p</sup> must already exist in the tree, which implies that Property 3 must also hold. Property 4 is more involved and relies on the fact that a node n with round number r can be COMMITTED only if there exist no other conflicting node n <sup>0</sup> with a bigger round number r 0 (the parent of n <sup>0</sup> having a round smaller than r implies that n and n <sup>0</sup> are conflicting).

Proof. (⇒): Assume that σ is correct. We show that it satisfies the above properties:

	- Property 3a: It is a direct consequence of the valueConstraint(n) predicate used at line 8 in Algorithm 1.

(⇐): We prove that every sequence σ that satisfies properties 1–4 is correct. We proceed by induction on the size of σ. The base step is trivial. For the induction step, let σ be a sequence of size k + 1. If σ satisfies properties 1-4, then the prefix σ 0 containing the first k labels of σ satisfies properties 1-4 as well. By the induction hypothesis, σ 0 is correct. We show that the last invocation of σ, denoted by σk+1 can be executed in the QTree state q<sup>|</sup>σ<sup>0</sup> <sup>|</sup> reached after executing σ 0 . We start with a lemma stating an inductive invariant for reachable QTree states:

Lemma 1. For every node n in any state q reached after executing a correct sequence σ of successful invocations, n.status is COMMITTED if n is Root or σ contains a commit(r) invocation. Else, n.status is GHOST if q contains a node n 0 with n 0 .round > n.round and n 0 is conflicting with n, and it is ADDED, otherwise. Proof. We proceed by induction on the size of σ. The base step is trivial. For the induction step, let σ be a sequence of size m+ 1. Let q<sup>m</sup> be the state reached after executing the prefix of size m of σ, and let σm+1 be the last invocation label of σ. We show that the property holds for any possible σm+1 that takes the QTree state q<sup>m</sup> to some other state qm+1:


There are two cases to consider depending on whether σk+1 is an add or commit invocation label:

	- newRound(n): Due to Property 1, r 6= n 0 .round for any other node n <sup>0</sup> <sup>∈</sup> <sup>q</sup><sup>|</sup>σ<sup>0</sup> <sup>|</sup> and the predicate is satisfied.
	- link(n): To satisfy this predicate, there must exist a node in q<sup>|</sup>σ<sup>0</sup> <sup>|</sup> with round r<sup>p</sup> where r<sup>p</sup> < r. By Property 3, if σ contains add(r, \_, rp) ⇒ OK with r<sup>p</sup> 6= 0, then add(rp, \_, \_) ⇒ OK also exists in σ. Hence, there exists a node p with round r<sup>p</sup> in q<sup>|</sup>σ<sup>0</sup> | , and the predicate is satisfied. If r<sup>p</sup> = 0, then q<sup>|</sup>σ<sup>0</sup> <sup>|</sup> contains the Root node (with round 0) which ensures that the predicate is satisfied.
	- extendsT runk(n): This predicate states that n extends the node n 0 which has the highest round number among the nodes with COMMITTED status, if n.round > n 0 .round. Assume by contradiction that this is not the case, i.e., n.round > n 0 .round but n and n <sup>0</sup> are conflicting. Let n<sup>1</sup> be the lowest common ancestor of n and n 0 (the first common node on the paths from n and n 0 to the Root). Since the round numbers decrease when going from one node towards Root, we have that n1.round < n 0 .round. If we consider the nodes on the path from n to n1, since n.round > n 0 .round, there must exist a node n<sup>2</sup> such that n2.round > n 0 .round but n2.parent.round < n 0 .round. The node n<sup>2</sup> in q<sup>|</sup>σ<sup>0</sup> <sup>|</sup> corresponds to the

invocation label add(n2.round,\_, n2.parent.round) in σ 0 . Moreover, the COMMITTED status of n 0 implies the existence of commit(n 0 .round) in σ 0 as stated in Lemma 1. However, it is impossible that σ 0 contains both these invocation labels if Property 4 holds.


# 4 Linearization Points

We describe an instrumentation of consensus protocols with linearization points of successful QTree invocations, and illustrate it using Paxos as a running example. Section 5 and Section 6 will discuss other protocols like HotStuff, Raft, PBFT, and multi-Paxos. This instrumentation defines the mapping Γ between actions of a protocol and QTree, respectively, such that the protocol is a Γrefinement of QTree. We also discuss the properties of this instrumentation which imply that establishing Γ-refinement is an effective proof for the safety of the protocol.

The identification of linearization points relies on the fact that protocol executions pass through a number of rounds, and each round goes through several phases (rounds can run asynchronously – processes need not be in the same round at the same time). The protocol imposes a total order over the phases inside a round and among distinct rounds. Processes executing the protocol can only move forward following the total order on phases/rounds. Going from one phase to the next phase in the same round is possible if a quorum of processes send a particular type of message. The refinement proofs require identifying two quorums for each round where a value is first proposed to be agreed upon and then decided. They correspond to linearization points of successful add(r, \_, \_) and commit(r), respectively. The linearization point of add(r, v, rp) ⇒ OK occurs when intuitively, the value v is proposed as a value to agree upon in round r. For the protocols we consider, v is determined by a designated leader after receiving a set of messages from a quorum of processes. For single-decree consensus, members of the quorum send the latest round number and value they adopted (voted) in the past and the leader picks a value corresponding to the

maximum round number rp. If no one in the quorum has adopted any value yet, then the leader is free to propose any value received from a client, and r<sup>p</sup> equals a default value 0. For state-machine replication protocols like HotStuff or Raft, the round r<sup>p</sup> is defined in a different manner – see Section 5 (and the full version of this work [9]). The linearization point of commit(r) ⇒ OK occurs when a quorum of nodes adopt (vote for) a value v proposed at round r.

By Theorem 1, proving that the order between linearization points along a protocol execution defines a correct QTree execution reduces to showing Properties 1–4. In general, Properties 1–3 are quite straightforward to establish and follow from the control-flow of a process. Property 3a is specific to single-decree consensus protocols or compositions thereof, e.g., (multi-)Paxos and PBFT. It will not hold for Raft or Hotstuff. Property 4 is related to the fact that any two quorums of processes intersect in a correct process.

Above, we have considered the case of a protocol that is a refinement of a single instance of QTree. State machine replication protocols that are composed of multiple independent consensus instances, e.g., PBFT (see Section 6), are refinements of a set of QTree instances (identified using a sequence number) and every linearization point needs to be associated with a certain QTree instance.

#### 4.1 Linearization Points for Paxos

For concreteness, we exemplify the instrumentation with linearization points on the single-decree Paxos protocol. We start with a brief description of this protocol that focuses on details relevant to this instrumentation.

Paxos proceeds in rounds and each round has a unique leader. Since the set of processes running the protocol is fixed and known by every process, the leader of each round can be determined by an a-priorly fixed deterministic procedure (e.g., the leader is defined as r mod N where r is the round number and N the number of processes). For each round, the leader acts as a proposer of a value to agree upon.

A round contains two phases. In the first phase, the leader broadcasts a START message to all the processes to start the round, executing the **START** action below, and processes acknowledge with a JOIN message if some conditions are met, executing the **JOIN** action:


<sup>5</sup> Each process has a local variable maxJoinedRound that stores the maximal round it has joined or voted for in the past and checks whether maxJoinedRound < r

If the leader receives JOIN messages from a quorum of processes, i.e., at least f +1 processes from a total number of 2f +1, the second phase starts. The leader broadcasts a PROPOSE message with a value, executing the **PROPOSE** action below. Processes may acknowledge with a VOTE message if some conditions are met, executing a **VOTE** action. If the leader receives VOTE messages from a quorum of processes, then the proposed value becomes decided (and sent to the client) by executing a **DECIDE** action:


Linearization points in Paxos. We instrument Paxos with linearization points as follows:


We illustrate the definition of linearization points for Paxos in relation to QTree executions in the full version [9].

#### Theorem 2. Paxos refines QTree.

Proof. We show that the sequence of successful add and commit invocations defined by linearization points along a Paxos execution satisfies the properties in Theorem 1 and therefore, it represents a correct QTree execution:

– Property 1: Each round has a unique leader and the leader follows the rules of the protocol (no Byzantine failures), thereby, making a single proposal. Therefore, the linearization point of an add(r, \_, \_) ⇒ OK will occur at most once for a round r. Since a single value can be proposed in a round, and all processes follow the rules of the protocol, they can only vote for that single value. Thus, at most one linearization point of commit(r) ⇒ OK can occur for a round r.

	- Property 3a: By the definition of **PROPOSE**, the proposer selects the JOIN message with the highest vote round number and proposes its value. Thus, if the linearization points of both add(r, v, r<sup>0</sup> ) ⇒ OK and add(r 0 , v<sup>0</sup> , \_) ⇒ OK occur, then v = v 0 .

The proof of Property 4 relies exclusively on the quorum of processes in the first phase of a round intersecting the quorum of processes in the second phase of a round. It is not needed that quorums in first, resp., second, phases of different rounds intersect. This observation is at the basis of an optimization that applies to non-Byzantine protocols like Flexible Paxos [18] or Raft (see the full version [9]).

# 4.2 Inferring Safety

The main idea behind these linearization points is that successful add and commit invocations correspond to some process doing a step that witnesses for the receipt a quorum of messages sent in a certain phase of a round. Intuitively, linearization points of successful add invocation occur when some process in some round is certain that a quorum of processes received or will receive the same proposal (same value, parent etc.) for the same round and acts accordingly (sends a message). Such proposal on a value v in a round r is denoted by the linearization point of successful add(r, v, r<sup>0</sup> ) for some r 0 . On the other hand, the linearization point of a successful commit(r) invocation occurs when a process decides on a value in round r (e.g., after receiving a quorum of votes). Formally, if we denote the actions of a protocol that correspond to linearization points of successful add(r, v, r<sup>0</sup> ) and commit(r) invocations using a<sup>a</sup> and ac, respectively, then Γ(aa) = add(r, v, r<sup>0</sup> ) ⇒ OK and Γ(ac) = commit(r) ⇒ OK.

When the protocol is such a Γ-refinement of QTree, then, it satisfies agreement and validity. If a decision on a value v in a round r of a protocol is the linearization point of a successful commit(r ), then by Theorem 1, the corresponding QTree state contains a node n with n.round = r, n.value = v, and n.status = COMMITTED. For single-decree consensus, Proposition 3 ensures that all rounds decide on the same value. For state machine replication protocols like Raft and HotStuff, where the goal is to agree on a sequence of commands, Proposition 2 ensures that all the decided values lie on the same branch of the tree which ensures that all processes agree on the same sequence of commands.

For validity, when valueConstraint(n) is considered, successful add(r, v, 0) invocations represent proposals of client values. Theorem 1 ensures that these invocations correspond to nodes n that are immediate children of Root and for any such node n, n.value = v. Therefore, by Proposition 1, we can conclude that only client values can be decided. When valueConstraint(n) is not considered, the fact that the value of each node is obtained from a client is ensured using additional mechanisms that are straightforward, e.g., a client broadcasting a command to all the participants in the protocol.

# 5 HotStuff Refines QTree

We present an instrumentation of HotStuff with linearization points of successful add and commit invocations. We use HotStuff as an example of a state machine replication protocol where processes agree over a sequence of commands to execute, and any new command proposed by a leader to the other processes comes with a well-identified immediate predecessor in this sequence. Agreement over a command entails agreement over all its predecessors in the sequence. This is different from protocols such as multi-Paxos or PBFT, discussed in the next section, where commands are associated to indices in the sequence and they can be agreed upon in any order. Instrumentation of Raft is presented in the full version [9] and behaves in a similar manner.

In HotStuff, f out of a total of N = 3f + 1 processes might be Byzantine in the sense that they might show arbitrary behavior and send corrupt or spurious messages. However, they are limited by cryptographic protocols. HotStuff requires that messages are signed using public-key cryptography, which implies that Byzantine processes cannot imitate messages of correct (non-faulty) processes. Additionally, after receiving a quorum of messages, leaders must include certificates in their own messages to prove that a quorum has been reached. These certificates are constructed using threshold signature schemes and correct processes will not accept any message from the leader if it is not certified. Because of Byzantine processes, HotStuff requires quorums of size of 2f + 1 which ensures that the intersection of any two quorums contains at least one correct process.

Each process stores a tree of commands. When a node in this tree (representing some command) is decided, all the ancestors of this node in the tree (nodes on the same branch) are also decided. For a node to become decided, a leader must receive a quorum of messages in 3 consecutive phases after the proposal. After each quorum is established, the leader broadcasts a different certificate to state which quorum has been achieved and the processes update different local variables accordingly, with the same node (if the certificate is valid). These local variables are preNode, votedNode and decidedNode in the order of quorums.

To start a new round, processes send their preNode's to the leader of the next round in ROUND-CHANGE(r) messages and increment their round number. After getting a quorum of messages and selecting the preNode with the highest round, the leader broadcasts a PROPOSE(r) message with a new node (value is taken from the client) whose parent is the selected preNode. When the message is received by a process, it first checks if the new node extends the selected preNode. Then it accepts the new node if the node extends its own votedNode (it is a descendant of votedNode in the tree) or it has a higher round number than the round number of its votedNode, and sends<sup>6</sup> a JOIN(r) message with the same content. In the second (resp., third) phase, if a quorum of JOIN(r) (resp., PRECOMMIT\_VOTE(r)) messages is received by the leader, it broadcasts a PRE-COMMIT(r) (resp., COMMIT(r)) message, and processes update their preNode (resp., votedNode) with the new node, sending a PRECOMMIT\_VOTE(r) (resp., COMMIT\_VOTE(r)) message. In the fourth phase, when the leader receives a quorum of COMMIT\_VOTE(r), it broadcasts a DECIDE(r) message and processes update their decidedNode accordingly. See the full version [9] for more details.

For HotStuff, the linearization points of add and commit occur with the broadcasts of PRECOMMIT(r) and DECIDE(r) messages, respectively, that are valid , i.e., (1) they contain certificates for quorums of JOIN(r) or COM-MIT\_VOTE(r) messages, respectively, which respect the threshold signature scheme, and (2) they contain the same node as in those messages. More precisely,


Note that a Byzantine leader can send multiple valid PRECOMMIT(r) messages that include certificates for different quorums of JOIN(r) messages. A linearization point occurs when the first such message is sent. Even if processes reply to another valid PRECOMMIT(r) message sent later, this later PRECOMMIT(r) message contains the same preNode value, and their reply will have the same content. The same holds for DECIDE(r) messages. This remark along with the restriction

<sup>6</sup> For all received messages, a correct process also checks if the round number of the node sent by the leader is equal to the current round number of its own, and can send only one message for each phase in each round.

to valid messages and the fact that any two quorums intersect in at least one correct process implies that the sequence of successful add and commit invocations defined by these linearization points satisfies the properties in Theorem 1 and therefore,

#### Theorem 3. HotStuff refines QTree.

A detailed proof of the theorem above is given in the full version [9].

# 6 PBFT Refines QTree

The protocols discussed above are refinements of a single instance of QTree. State-machine replication protocols based Multi-decree consensus like Multi-Paxos or PBFT can be seen as compositions of a number of single-decree consensus instances that run concurrently, one for each index in a sequence of commands to agree upon, and they are refinements of a set of independent QTree instances. We describe the instrumentation of PBFT and delegate multi-Paxos (and variants) to the full version [9].

PBFT is a multi-decree consensus protocol in which processes aim to agree on a sequence of values. As in HotStuff, f out of a total number of 3f + 1 processes might be Byzantine and quorums are of size at least 2f + 1. To ensure authentication, messages are signed using public-key cryptography. Messages sent after receiving a quorum of messages in a previous phase include that set of messages as a certificate.

A new round r starts with the leader receiving a quorum of ROUND-CHANGE(r) messages (like in HotStuff). Each such message from a process p includes the VOTE message with the highest round (similarly to the **JOIN** action of Paxos) that p sent in the past, for each sequence number that is not yet agreed by a quorum. For an arbitrary set of sequence numbers sn, the leader selects the VOTE message with the highest round and broadcasts a PROPOSE(r,sn) message that includes the same value as in the VOTE message or a value received from a client if there is no such highest round. As mentioned above, this message also includes the VOTE messages that the leader received as a certificate for the selection. When a process receives a PROPOSE(r,sn) message, if r equals its current round, the process did not already acknowledge a PROPOSE(r,sn) message, and the value proposed in this message is selected correctly w.r.t. the certificate, then it broadcasts a JOIN(r,sn) message with the same content (this is sent to all processes not just the leader). If a quorum of JOIN(r,sn) messages is received by a process, then it broadcasts a VOTE(r,sn) message with the same content. If a process receives a quorum of VOTE(r,sn) messages, then the value in this message is decided for sn. When a process sends its highest round number VOTE messages to the leader of the next round (in ROUND-CHANGE messages), it also includes the quorum of JOIN messages that it received before sending the VOTE, as a certificate.

PBFT is a refinement of a set of independent QTree instances, one instance for each sequence number. The linearization points will refer to a specific instance identified using a sequence number, e.g., sn.add(r, v, r<sup>0</sup> ) denotes an add(r, v, r<sup>0</sup> ) invocation on the QTree instance sn. Therefore,


A protocol refines a set of QTree instances identified using sequence numbers when it satisfies Properties 1-4 in Theorem 1 for each sequence number, e.g., Property 1 becomes for every sn and every r, a protocol execution contains a linearization point for at most one invocation sn.add(r, \_, \_) ⇒ OK and at most one invocation sn.commit(r) ⇒ OK. A detailed proof of the following theorem is given in the full version [9].

Theorem 4. PBFT refines a composition of independent QTree instances.

# 7 Discussion

Protocols considered in this work can be grouped under three classes: singledecree consensus (Paxos), multi-decree consensus (PBFT, Multi-Paxos) and state machine replication (Raft, HotStuff)<sup>7</sup> . We show that they all refine QTree: a single instance for Paxos and HotStuff, and a set of independent instances (one for each sequence number in a command log) for PBFT, Multi-Paxos, and Raft. The more creative parts of the refinement proofs are the identification of add and commit linearization points and establishing Property 4 in Theorem 1 which follows from the intersection of quorums achieved in different phases of a round. The other 3 properties in Theorem 1 which guarantee that the linearization points are correct are established in a rather straightforward manner, based on the control-flow of a process participating to the protocol.

The linearization points of successful add and commit invocations correspond to some process doing a step that witnesses for the receipt a quorum of messages sent in a certain phase of a round, e.g., the leader broadcasting a PROPOSE(r) message in Paxos entails that a quorum of JOIN(r) messages have been sent in the first phase and received. Protocols vary in the total number of phases in a round, and the phases for which quorums of sent messages should be received in order to have a linearization point of add or commit. A summary is presented in Table 1. The \* on the total number of phases means that the first phase is skipped in rounds where the leader is stable. For Multi-Paxos and Raft, if the first phase

<sup>7</sup> This is a slight abuse of terminology since multi-decree consensus protocols are typically used to implement state machine replication.

is skipped, then the linearization point of an add is determined by a quorum of received messages sent in the next phase (and coincides with the linearization point of a commit). We use "1/2" to denote this fact. In PBFT and HotStuff, due to Byzantine processes, quorums of messages sent in two consecutive phases need to be received in order to ensure that the processes are going to vote on the same valid proposal. The 3rd phase in HotStuff is used to ensure progress and can be omitted when reasoning only about safety.

Table 1: Summary of linearization point definitions. For each protocol, we give the total number of phases in a round and the number of the phase for which a quorum of sent messages should be received in order to have a linearization point of add or commit.


# 8 Conclusion and Related Work

We have proposed a new methodology for proving safety of consensus or statemachine replication protocols, which relies on a novel abstraction of their dynamics. This abstraction is defined as a sequential QTree object whose state represents a global view of a protocol execution. The operations of QTree construct a tree structure and model agreement on values or a sequence of state-machine commands as agreement on a fixed branch in the tree. Our methodology applies uniformly to a range of protocols like (multi-)Paxos, HotStuff, Raft, and PBFT. We believe that this abstraction helps in improving the understanding of such protocols and writing correct implementations or optimizations thereof.

As a limitation, it is not clear whether QTree applies to protocols such as Texel [31] which do not admit a decomposition in rounds. As future work, we might explore the use of QTree in reasoning about liveness. This would require some fairness condition on infinite sequences of add/commit invocations, and a suitable notion of refinement which ensures that infinite sequences of protocol steps cannot be mapped to infinite sequences of stuttering QTree steps.

The problem of proving the correctness of such protocols has been studied in previous work. We give an overview of the existing approaches that starts with safety proof methods based on refinement, which are closer to our approach.

Refinement based safety proofs. Verdi [35] is a framework for implementing and verifying distributed systems that contains formalizations of various network

semantics and failure models. Verdi provides system transformers useful for refining high-level specifications to concrete implementations. As a case study, it includes a fully-mechanized correctness proof of Raft [36]. This proof consists of 45000 lines of proof code (manual annotations) in the Coq language for a 5000 lines RAFT implementation, showing the difficulty of reasoning on consensus protocols and the manual effort required. Iron Fleet [17] uses TLA [22] style transition-system specifications and refine them to low-level implementations described in the Dafny programming language [25]. Boichat et al. [3] defines a class of specifications for consensus protocols, which are more abstract than QTree and can make correctness proofs harder. Proving Paxos in their case is reduced to a linearizability proof towards an abstract specification, which is quite complex because the linearization points are not fixed, they depend on the future of an execution. As a possibly superficial quantitative measure, their Paxos proof reduces to 7 lemmas that are formalized by Garcia-Perez et al. [12,13] in 12 pages (see Appendix B and C in [13]), much more than our QTree proof. Our refinement proof is also similar to a linearizability proof, but the linearization points in our case are fixed (do not depend on the future of an execution) which brings more simplicity. In principle, the specifications in [3] could apply to more protocols, but we are not aware of such a case. The inductive sequentialization proof rule [20] is used for a fully mechanized correctness proof of a realistic Paxos implementation. This implementation is proved to be a refinement of a sequential program which is quite close to the original implementation, much less abstract than QTree, and relies on commutativity arguments implied by the communication-closed round structure [11]. A similar idea is explored in [14], but in a more restricted context.

Inductive invariant based safety proofs. Ivy [30] is an SMT-based safety verification tool that can be used for verifying inductive invariants about global states of a distributed protocol. In order to stay in a decidable fragment of first-order logic, both the modeling and the specification language of IVY are restricted. A simple model of Paxos obeying these restrictions is proven correct in [29].

Beyond safety. The TLA+ infrastructure [22] of Lamport has been used to verify both safety and liveness (termination) of several variations of Paxos, e.g., Fast Paxos [23] or Multi-Paxos [6]. Bravo et al. [4] introduce a generic synchronization mechanism for round changes, called the view synchronizer, which guarantees liveness for various Byzantine consensus protocols including our cases studies HotStuff and PBFT. This work includes full correctness proofs for single-decree versions of HotStuff and PBFT and a two-phase version of HotStuff. PSync [10] provides a partially synchronous semantics for distributed protocols assuming communication-closed rounds in the Heard-Of model [8]. PSync is used to prove both safety and liveness of a Paxos-like consensus protocol called lastVoting.

Relating different consensus protocols. Lamport defines a series of refinements of Paxos that leads to a Byzantine fault tolerant version, which is refined by PBFT [24]. Our proof that Paxos refines QTree can be easily extended to this Byzantine fault tolerant version in the same manner as we did for PBFT. Wang et al. [34] shows that a variation of RAFT is a refinement of Paxos, which enables porting some Paxos optimizations to RAFT. Renesse et al. [32] compare Paxos, Viewstamped Replication [28] and ZAB [19]. They define a rooted tree of specifications represented in TLA style whose leaves are concrete protocols. Each node in this tree is refined by its children. Common ancestors of concrete protocols show similarities whereas conflicting specifications show the differences. Similarly, [33] shows that Paxos, Chandra-Toueg [7] and Ben-Or [2] consensus algorithms share common building blocks. Aublin et al. [1] propose an abstract data type for specifying existing and possible future consensus protocols. Unlike our QTree, core components of this data type are not implemented and intentionally left abstract so that it can adapt to different network and process failure models.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# MAGπ: Types for Failure-Prone Communication

Matthew Alan Le Brun() and Ornela Dardha()

University of Glasgow, Glasgow, UK m.le-brun.1@research.gla.ac.uk ornela.dardha@glasgow.ac.uk

Abstract. Multiparty Session Types (MPST) are a typing discipline for communication-centric systems, guaranteeing communication safety, deadlock freedom and protocol compliance. Several works have emerged which model failures and introduce fault-tolerance techniques. However, such works often make assumptions on the underlying network, e.g., assuming TCP-based communication where messages are guaranteed to be delivered; or adopting centralised reliable nodes and ad-hoc notions of reliability; or only addressing a single kind of failure, such as node crashes. In this work, we develop MAGπ—a Multiparty, Asynchronous and Generalised π-calculus, which is the frst language and type system to accommodate in unison: (i) the widest range of non-Byzantine faults, including message loss, delays and reordering; crash and link failures; and network partitioning; (ii) a novel and most general notion of reliability, taking into account the viewpoint of each participant in the protocol; (iii) a spectrum of network assumptions from the lowest UDP-based network programming to the TCP-based application level. We prove subject reduction and session fdelity; process properties (deadlock freedom, termination, etc.); failure-handling safety and reliability adherence.

Keywords: Session types · Distributed protocols · Failures · Timeouts

# 1 Introduction

Despite large investments into fault-prevention techniques, failures still regularly occur in complex distributed applications. It is widely agreed that traditional methods of verifcation using software testing do not provide high levels of confdence in the correctness of distributed algorithms. This is mainly due to the nondeterministic behaviour inherent to these protocols, which makes it unfeasible to manually test for all edge cases. This problem is bypassed by using exhaustive techniques such as model checking [9,31], capable of exploring the entirety of the state space of a program to verify its correctness. However, building suitable models for complex distributed algorithms is arduous, expensive, and often intractable (due to the state explosion problem [10]). Furthermore, even if an algorithm is successfully encoded into a suitable model and checked, guarantees of correctness are on the design of the algorithm, and not on the software implementation; handwritten code is still prone to human error. Contrastively, types

and type systems [29] are lightweight forms of verifcation. Baked in programming languages, types provide guarantees directly on handwritten code and aid developers in implementing software which is correct by construction. Specifc to concurrent and distributed computing, session types [14,35,15,36,33,16] have quickly grown in popularity since their initial conceptualisation [14], spanning from binary–two participants, to multiparty–many participants.

Session types enforce that processes communicate according to a protocol specifcation. Consequently, desirable properties about communication, e.g., type safety (communication occurs error-free), protocol compliance (or session fdelity; processes behave according to their predefned protocol), and deadlock freedom (processes do not get stuck waiting), can be statically determined by a type checker. To this aim, session types have been implemented in various programming languages, including Java [18,11], Go [21], Haskell [17,27], Scala [32], Rust [19], Elixir [34].

To date, most session type theories are designed for concurrent, as opposed to distributed processes—i.e., it is commonly assumed that communication failures do not occur. For the few (and rapidly increasing) works that do consider failures, heavy assumptions are made that impede their viability for realistic complex distributed applications. E.g., asynchronous theories [24,16,33] use message bufers to model distributed communication under "TCP-like" assumptions: messages are guaranteed to be delivered and messages from a single sender do not get reordered. Afne sessions [25,12,6] only allow failure-handling of application level failures through try-catch blocks; there is no support for arbitrary failures that may stem from hardware faults, network inconsistencies etc. Coordinator model approaches [1,8,37] assume some degree of reliability, be it as a central resilient process, a reliable broadcast, or fxed synchronisation points.

The harsh reality is that many real-world distributed protocols (e.g., consensus algorithms) cannot assume any of these conditions. Networks introduce many points of failure into a system: nodes may crash, messages can be dropped, delayed or duplicated, links between nodes may fail etc. Designers of distributed protocols have acknowledged that failure is inevitable, and so algorithms are designed to withstand a threshold of failure whilst still achieving their expected behaviour—known as fault-tolerance [22]. Examples of fault-tolerant protocols (extensively) used today include the Paxos [20] and Raft [26] consensus algorithms, which assume the possibility of all non-Byzantine faults—i.e., node crashes, link failures, network partitions, and message inconsistencies.

Although the correctness of these algorithms has been heavily studied, many of them are developed with limited confdence in the correctness of the deployable artifact, due to the reasons previously outlined. To fll this gap, we need type-based verifcation, which can be made available to programming languages, thus supporting designers and developers in designing and implementing correct distributed algorithms. While (multiparty) session types have made great impact in modelling structured communication and guaranteeing relevant properties, their theory is not yet expressive to model these complex algorithms.

In this paper, we take steps towards flling this gap by presenting MAGπ—a Multiparty, Asynchronous and Generalised π-calculus—the frst language and type system able to accommodate: (i) the widest range of non-Byzantine faults, including message loss, delays and reordering; crash and link failures; and network partitioning—all by using timeouts; (ii) a novel and most general notion of reliability, taking into account the viewpoint of each participant in the protocol; and (iii) a spectrum of network assumptions—from the lowest level of network programming based on UDP, to application level based on TCP.

Example 1 (Ping Pong: Types). We illustrate MAGπ with a simplifed version of the ping utility from the Internet Control Message Protocol (ICMP<sup>1</sup> ), which is our running example. The ping utility consists of a total of three roles communicating amongst each other: two roles, p and r, communicate reliably with each other, and both communicate unreliably with a third role q. Our defnition of reliability (§ 3.2) takes into account the viewpoint of each role, thus allowing roles to have their own (possibly empty) reliability set. Following the assumptions above, the reliability set for p is {r}, for r is {p}, and for q is ∅.

Below we give the session types, denoted S<sup>r</sup> , S<sup>p</sup> and S<sup>q</sup> for roles r, p and q respectively.

$$\mathbb{S} = \{ \mathbb{s} \mid \mathsf{p} \mathrel{?} \mathsf{ok}(). \mathsf{end}, \ \mathsf{p} \mathrel{?} \mathsf{ko}(). \mathsf{end} \}$$

$$\mathbb{S}\_{\mathsf{p}} = \mathsf{q}\texttt{!}\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\texttt{\cdot}}}}}}}}}}}}}}}}}}}}}}}}}}}}}\c}$$
}\c

$$\mathbb{S}\_{\mathsf{q}} = \& \begin{cases} \mathsf{p}\text{ } \mathsf{?}\,\mathsf{ping}().\mathsf{p}\,\mathsf{!}\,\mathsf{pong}().\mathsf{end},\\ \circlearrowleft \begin{cases} \mathsf{p}\,\mathsf{?}\,\mathsf{ping}().\mathsf{p}\,\mathsf{!}\,\mathsf{popg}().\mathsf{end},\\ \circlearrowright \begin{cases} \mathsf{p}\,\mathsf{?}\,\mathsf{ping}().\mathsf{p}\,\mathsf{!}\,\mathsf{popg}().\mathsf{end},\\ \circlearrowright \begin{cases} \mathsf{p}\,\mathsf{?}\,\mathsf{ping}().\mathsf{p}\,\mathsf{!}\,\mathsf{popg}().\mathsf{end},\\ \circlearrowright \end{cases} \end{cases}$$

Role r is the receiver (&–called branching), which waits on two options: it receives from p either the label ok or ko and then it terminates the protocol (end). Role p is the sender (⊕ 2 –called selection), and it tries to obtain information on the status of q. It begins by sending a ping message to q (q ! ping()), then waits to receive from q. If a pong is received (q ? pong()) in the top branch, then it concludes that the status of q is reachable and sends this information to r (r ! ok()), after which it terminates. Alternatively, p enters a timeout branch (). For simplicity, we assume p will attempt to communicate with q three times (shown in the three-time indentation of the timeout branch) before assuming q is unreachable; after which the session will also terminate by sending ko to r, followed by end. In the same lines, the protocol for role q is given by the session type Sq, where its timeout branches match the timeouts from Sp.

<sup>1</sup> https://www.rfc-editor.org/rfc/rfc792

<sup>2</sup> For readability, we adopt a shorthand notation for sending towards a single role and for payloads of type unit, such that ⊕{s ! m(unit).S} is represented by s ! m().S.

# 1.1 Contributions

We now present our contributions w.r.t. our Multiparty, Asynchronous, and Generalised π-calculus (MAGπ).

	- MAGπ is the frst language to support the widest set of non-Byzantine faults, including message loss, message delays and message reordering; crash failures and link failures; and network partitioning.
	- MAGπ is the frst language to introduce timeouts in receive branches (used for handling network failures), as well as support undirected branching in a generalised setting—the ability to simultaneously expect an incoming message from more than one sender.
	- is a conservative extension of a generalised asynchronous MPST theory [33], benefting from: the ability to model more protocols than traditional syntactic theories (e.g. global types); and the fexibility of checking desired properties, such as deadlock freedom or termination, a posteriori—as opposed to during the design phase.
	- supports undirected branching/selection and is the frst type system to introduce timeout branches.
	- supports a novel and most general reliability defnition (§ 3.2), taking into account the viewpoint of each participating role, and is built on optional role-dependant reliability assumptions.

# 2 MAGπ: Language

We present a multiparty session π-calculus, based on the theory of Scalas and Yoshida [33], extended to accurately model real-world distributed network environments. We assume the lowest level of abstraction—the only failure detection mechanism available to a process is an upper-bound wait limit, i.e., a timeout.

$$\begin{array}{l|l} c ::= & x \mid \, s[\mathtt{p}] & (variable, \, session \, w/\, role) \\ d ::= & v \mid \, c & (basic\, value, \, variable, \, session \, w/\, role) \\ w ::= & v \mid \, s[\mathtt{p}] & (basic\, value, \, session \, w/\, role) \\ P, Q ::= & \mathbf{0} \mid \, (\nu s)P & (basic\,\, value, \, resolution) \\ & \mid P \mid Q \mid \mid P+Q & (composition,\, non-determin\,istic\, choice) \\ & \mid c \oplus [\mathtt{q}] \mid \mathtt{m}(d) \, P & (selection\,\mathtt{tw}ards\,\mathtt{r}0 \, \mathtt{q}) \\ & \mid c \oplus \boldsymbol{\Bbbk} \{ [\mathtt{q}\_{i}] \, \mathtt{m}(\boldsymbol{x}\_{i}) \, P\_{i} \} & (relabel\,\mathtt{b}\,\mathtt{n} \, of\mathtt{n}\, \mathtt{q} \, \mathtt{i} \, w/\,\mathtt{i} \, \mathtt{t}\mathtt{m} \, \mathtt{q} \, \mathtt{i} \, w \\ & \mid c \& \boldsymbol{\Bbbk} \{ [\mathtt{q}\_{i}] \, \mathtt{m}(\boldsymbol{x}\_{i}) \, P\_{i} \} & (branch\,\mathtt{b}\,\mathtt{n} \, of\mathtt{n}\, \mathtt{q} \, \mathtt{i} \, w/\,\mathtt{i} \, \mathtt{m} \, \mathtt{q} \, \mathtt{i} \, w) \\ & \mid \mathtt{det} \, D \, \mathtt{in} \, P & (process\,\mathtt{definit},\, \mathtt{pro\,s} \, \mathtt{call}) \\ & \mid s \,\boldsymbol{\sigma} \, \boldsymbol{\sigma} & (process\,\mathtt{do},\, \mathtt{not}) \, \mathtt{q} \, \mathtt{e} & (\textit{pro\,s} \, definition,\mathtt{q} \, w) \\ &$$

#### Fig. 1. Syntax for MAGπ

Our calculus presents three novel features: (i) the new timeout primitive; (ii) the capability of expecting a message from diferent senders; and (iii) operational semantics which can model various non-Byzantine failures. Timeouts can be attached to receive actions—henceforth referred to as branches—and are used to describe an alternative process to be executed in case failures are assumed to occur (akin to error handlers).

Failures are said to be assumed, as opposed to detected, since we model the impossibility result of distinguishing between a delayed vs lost message. Thus, it is possible for a processes to prematurely timeout without its corresponding message having been lost—just like the real-world!

The beneft of our approach is that the failure detection mechanism is agnostic to the type of fault, allowing us to model in unison message loss, message delay, crash-stop failures, link failures, and network partitions.

#### 2.1 Syntax

Defnition 1 (Language Syntax). The Multiparty, Asynchronous and Generalised π-calculus syntax is defned by the grammar in fg. 1.

Communication happens over sessions (s, s′ ) between a number of roles (p, q) ranging over set ρ. The primitives of the calculus are sessions with roles s[p], and basic values v, both of which can be abstracted using variables (x, y). Processes (P, Q), include the following standard constructs: (i) inaction 0 represents process termination; (ii) session restriction (νs) P binds a new session s in P; (iii) parallel composition declares two concurrent processes; (iv) selection c⊕[q] ! m⟨d⟩. P uses channel c to send a message to q with label m and payload

[R-⊕] s[p]⊕[q] ! m⟨w⟩. P | s:σ −→ P | s:σ · (p ▷ q ◁ m⟨w⟩) · ϵ [R- &] s[q] &i∈<sup>I</sup> {[p<sup>i</sup> ] ? mi(xi).P<sup>i</sup> [, . Q]} | s:(p<sup>k</sup> ▷ q ◁ mk⟨w⟩) · σ −→ Pk[ <sup>w</sup>/x<sup>k</sup> ] | s:σ for k ∈ I [R-] s[q] &i∈<sup>I</sup> {[p<sup>i</sup> ] ? mi(xi).P<sup>i</sup> , . Q} | s:σ −→ Q | s:σ [R-+] P1+P<sup>2</sup> −→ P<sup>i</sup> for i ∈ {1, 2} [R-X] def X(x1,. . . ,xn) = P in (X⟨w1,. . . ,wn⟩ | Q) −→ def X(x1,. . . ,xn) = P in (P[ <sup>w</sup><sup>1</sup> /x<sup>1</sup> ] · · · [ <sup>w</sup><sup>n</sup> /x<sup>n</sup> ] | Q) [R-C] P −→ P ′ =⇒ C[P] −→ C[P ′ ] [R-↓] s:h · σ −→ s:σ

Fig. 2. Reduction rules for MAGπ

d—after sending, the process continues according to P; (v) defnition and declaration allow processes to be assigned names, modelling recursion through the use of process calls. We now elaborate on the novelties in our language.


#### 2.2 Operational Semantics

We begin with defnitions of a reduction context and bufer congruence.

Defnition 2 (Reduction Context). A reduction context C abstracts away an outer environment from a process, and is given by:

$$\mathbb{C} \implies \mathbb{C} \mid P \mid \begin{array}{c} (\nu s) \mathbb{C} \mid \ \mathsf{def} \ D \ \text{in} \ \mathbb{C} \mid \ [ \ ] \end{array}$$

Hence, C[P] refers to process P under some arbitrary context C.

Defnition 3 (Bufer Congruence). A process containing only a bufer under its restriction is congruent to inaction. Message bufers observe total reordering.

$$(\nu s) \, s \colon \sigma \equiv \mathbf{0} \qquad\qquad s \colon \sigma\_1 \cdot h\_1 \cdot h\_2 \cdot \sigma\_2 \equiv \, s \colon \sigma\_1 \cdot h\_2 \cdot h\_1 \cdot \sigma\_2$$

Defnition 4 (OS). The operational semantics for MAGπ is given via a reduction relation −→ inductively defned in fg. 2, together with standard structural congruence rules [33] and two bufer congruence rules defned in def. 3.

Let us now comment on the reduction rules (fg. 2). Processes send messages using the selection rule [R-⊕]; this adds the sent message as a new entry in the session bufer, and advances the process to its continuation. Sent messages are read from the bufer using the branching rule [R- &]. If the receiver has a valid branch matching the sender and message label, then it advances to the specifc continuation of said branch (a timeout branch for this rule is optional). The substitution Pk[ <sup>w</sup>/x<sup>k</sup> ] denotes the replacement of variable x<sup>k</sup> with the payload value w in the continuation process Pk. The timeout rule [R-] advances processes to their timeout branch without changing the bufer. Non-deterministic choice is resolved using the choice rule [R-+], which advances the process to one of the two possible continuations. The call rule [R-X] replaces a process call with its defned process, substituting each parameter. Processes can reduce under a context using the context rule [R-C]. Lastly, messages can be lost from the bufer with the drop rule [R-↓].

We now unpack how our semantics deals with failures. The reduction rules in fg. 2 allow various forms of failures to be modelled, stemming from the versatility and elegance of the drop rule [R-↓]. The following elaborates on how this rule can be utilised to model diferent types of failure:


$$\begin{aligned} &s[\mathfrak{q}] \&\_{i \in I} \{ [\mathfrak{p}\_i] \, ? \, \mathfrak{m}\_i(x\_i).P\_i, \, \bigcirc \, Q \} \quad \mid \, s : (\mathfrak{p}\_k \rhd \mathfrak{q} \lhd \mathfrak{m}\_k \langle w \rangle) \cdot \sigma \\ &\longrightarrow \, Q \; \mid \, s : (\mathfrak{p}\_k \rhd \mathfrak{q} \lhd \mathfrak{m}\_k \langle w \rangle) \cdot \sigma \end{aligned} \qquad \text{for } k \in I.$$

– Total message reordering is modelled via bufer congruence rules (def. 3).

– Network partitions can be represented using multiple link failures.

The granularity at which we model failures allows for degrees of customisation. E.g., benign fault-tolerant consensus algorithms typically assume the possibility of all non-Byzantine faults, therefore all the aforementioned failures are required. Alternatively, an application assumed to run over a trusted TCP network need not worry about single message drops, and hence [R-↓] should only be applied to model node crash and link failures.

Defnition 5 (Well-formedness). To ensure that communication is possible, we require that a well-formed process has a bufer for each session, i.e.,

$$P = \left(\nu s\right)Q \implies Q \equiv \left(\nu \tilde{s'}\right)\left(Q' \mid s:\sigma\right)$$

Def. 5 introduces a well-formedness condition to guarantee that a session always guards its bufer, hence ensuring that messages always have a queue to be placed in. From now on, we will only consider well-formed processes.

Before concluding this section, we recall our ping pong running example from the introduction, and present below the processes for roles p, q and r.

Example 2 (Ping Pong: Processes).

$$P\_{\mathfrak{p}} = s[\mathfrak{p}] \oplus [\mathfrak{q}] \upharpoonright \mathfrak{p} \text{ing} \langle \rangle . s[\mathfrak{p}] \& \left\{ \begin{array}{ll} [\mathfrak{q}] \ ? \text{pong} ( ) . P\_{\mathfrak{p}}^{ab} , \\ \langle \rangle . s[\mathfrak{p}] \oplus [\mathfrak{q}] \ ? \text{ping} ( ) . s[\mathfrak{p}] \ \& \left\{ \begin{array}{ll} [\mathfrak{q}] \ ? \text{pong} ( ) . P\_{\mathfrak{p}}^{ab} , \\ \langle \rangle . s[\mathfrak{p}] \ ? \text{ping} ( ) . \\ \end{array} \right\} . \begin{array}{ll} P\_{\mathfrak{p}}^{ab} = s[\mathfrak{p}] \oplus [\mathfrak{r}] \ ! \text{ok} \langle \rangle . \mathbf{0} \end{array} \right\} . P\_{\mathfrak{p}}^{ab} ,$$

$$P\_{\mathfrak{p}}^{ab} = s[\mathfrak{p}] \oplus [\mathfrak{r}] ! \text{ok} \langle \rangle . \mathbf{0} \qquad P\_{\mathfrak{p}}^{\prime} = s[\mathfrak{p}] \& \left\{ \begin{array}{ll} [\mathfrak{q}] ? \text{pong} ( ) . P\_{\mathfrak{p}}^{\prime ab} , \\ \langle \rangle . s[\mathfrak{p}] \ \oplus [\mathfrak{r}] ! \text{k} \langle \rangle . \mathbf{0} \end{array} \right\}$$

$$P\_{\mathfrak{q}} = s[\mathfrak{q}] \& \left\{ \begin{array}{ll} [\mathfrak{q}] ? \text{ping} ( ) . P\_{\mathfrak{q}}^{\prime ab} , \\ \langle \rangle . s[\mathfrak{q}] & \text{k} \left\{ \begin{array}{ll} [\mathfrak{q}] ? \text{ping} ( ) . P\_{\mathfrak{q}}^{\prime ab} , \\ \end{array} \right\} . \end{P\_{$$

# 3 MAGπ: Type System

We introduce the type system for MAGπ, which is a conservative extension of the generalised asynchronous MPST theory [33, sec. 7]. Generalised MPST stray away from global protocol specifcations (global types) and instead operate on user-defned localised specifcations of each participating role (local types). The benefts of working with such theory include: (i) the ability to capture a larger set of viable protocols compared to traditional syntactic methods (e.g. global types) of enforcing consistent communication; (ii) the ability to model protocols of diferent requirements. In particular, instead of syntactically enforcing programmers to write, e.g., deadlock-free code, a generalised theory allows programmers to unrestrictedly design protocols that are checked a posteriori against any number of required properties, such as deadlock-freedom, termination etc.

#### Basic and Session Types T ::= B | S B ::= int | bool | real | unit | · · · S ::= &i∈<sup>I</sup> {p<sup>i</sup> ? mi(Ti).S<sup>i</sup> [, . S ′ ]} | ⊕i∈<sup>I</sup> {p<sup>i</sup> ! mi(Ti).Si} | µt.S | t | end Bufer Types M ::= p ! m(T)·M | ϵ Session-Bufer Types τ ::= M | S | (M ; S)

Fig. 3. Basic Types, Session Types, Bufer Types and Session-Bufer Types

$$\begin{array}{ccccc}\hline\hline\text{T}\equiv\text{T} & & & \mathbf{M}\_{1}\cdot\mathbf{M}\_{2}\equiv\mathbf{M}\_{2}\cdot\mathbf{M}\_{1} & & & \mathbf{\epsilon}\cdot\boldsymbol{\epsilon}\equiv\boldsymbol{\epsilon} & & \mathbf{\overline{M}}\equiv\mathbf{M}' & \mathbf{\overline{S}}\equiv\mathbf{\overline{S}'}\\\hline\hline\end{array}$$

#### Fig. 4. Type congruence rules

The novelties of our type system include: (i) undirected branching/selection; (ii) timeout branches (syntax in § 3.1); and (iii) reliability sets—sets of roles assumed to not fail, from the perspective of each role (§ 3.2). Reliability sets (possibly empty) enforce the use of timeouts for all failure-prone communication.

As in [33], our type system does not use global types, but solely relies on local types. Consequently, typing contexts must obey a safety property to ensure subject reduction (§ 3.3). Finally, we present the rules for our type system in § 3.4, and discuss its key properties in § 4.

#### 3.1 Types

Our MPST theory is designed for the distributed computing setting. Concretely, our type system (def. 6) is asynchronous; it allows branching (resp. selection) from (resp. to) multiple roles; and supports timeout continuation types.

Defnition 6 (Typing syntax). The typing syntax is defned using the grammar in fg. 3. For undirected branching and selection, I ̸= ∅ and role-label tuples (p<sup>i</sup> , mi) must be pairwise distinct. Recursion variables cannot be free and must appear guarded under branching/selection types.

Type T denotes either a basic type B, or a session type S, and is used to type variables. Session types describe how a channel should be used: (i) undirected branching (external choice) &i∈<sup>I</sup> {p<sup>i</sup> ? mi(Ti).S<sup>i</sup> [, . S ′ ]} denotes receiving a message with label m<sup>i</sup> and payload of type T<sup>i</sup> from role p<sup>i</sup> , then continuing according to S<sup>i</sup> . The (optional) timeout continuation type S ′ describes the protocol for handling failure on that branch; (ii) undirected selection (internal choice) ⊕i∈<sup>I</sup> {p<sup>i</sup> ! mi(Ti).Si} denotes sending a message with label m<sup>i</sup> and payload T<sup>i</sup> to role p<sup>i</sup> , then continuing according to S<sup>i</sup> ; (iii) type end marks a channel as closed, and terminates communication. A session bufer is typed using the bufer type M. Entries in the bufer must correspond to the type p ! m(T)·M, denoting a message sent to p with label m and payload of type T. A session with role is typed using session-bufer types, combining a session type and a bufer type.

Type congruence ≡ is defned in fg. 4. Notably, bufer types can be reordered, and two session-bufer types are congruent if their individual bufer and session types are congruent. Bufer type reordering is necessary to match the total message reordering supported by the language (def. 3).

#### 3.2 Reliability

We go on a short detour and talk about reliability. Previous related work [4,1,38] have included the notion of reliability into their type systems. Generally, either one specifc role, or a pre-defned set of roles, are assumed to be reliable—i.e., no failures occur for communication involving the identifed set of roles.

Our defnition of reliability (def. 7) is the most general and the frst to take into account the viewpoint of each role. We argue that this is necessary in a distributed setting since reliability in networks is dependant on the physical topology of processes. Recalling the ping utility (example 1), we could imagine the processes representing roles p and r reside on the same physical hardware, thus their communication cannot be afected by network faults; and the process for q resides on geographically separated hardware, therefore its communication with both p and r is vulnerable to failure.

Defnition 7 (Reliability). The reliability set R for a role p ∈ ρ is defned as R ⊆ ρ \ {p}, capturing the viewpoint of p. Reliability R is defned as a function mapping roles to their reliability set, i.e., R : ρ → R.

To better model real distributed environments, our defnition of reliability allows each role to have its own (possibly empty) reliability set.

Example 3 (Ping Pong: Reliability Sets). W.r.t. example 1, as the three roles have diferent viewpoints on each other, then the reliability set for each of them is diferent. In particular, we have R(p) = {r}, R(r) = {p}, R(q) = ∅.

Investigating the extremes, we have: for a set of roles ρ, if for all p ∈ ρ·R(p) = ∅, then no communication is reliable; conversely, if for all p ∈ ρ · R(p) = ρ \ {p}, then all communication is reliable—referred to as a reliable network. This work only considers static confgurations for R, thus reliability sets cannot change at runtime. We fnd that even with this restriction, our defnition is the most general compared to related work.

#### 3.3 Contexts

Defnition 8 (Type contexts). Context Θ is a partial mapping from process variables to n-tuples of types and context Γ is a partial mapping from variables to types, and sessions with roles to session-bufer types, both defned below:

Θ ::= ∅ | Θ, X : T1, . . . , T<sup>n</sup> Γ ::= ∅ | Γ, x : T | Γ, s[p] : τ

The composition of contexts (Γ1, Γ2) is defned if:

$$\forall c \in \mathsf{dom}(\varGamma\_1) \cap \mathsf{dom}(\varGamma\_2) : \quad \varGamma\_i(c) = \mathsf{M} \land \varGamma\_j(c) = \mathsf{S}$$

For such c, (Γ1,Γ2)(c) = (M ; S). Contexts are congruent Γ<sup>1</sup> ≡ Γ<sup>2</sup> if:

$$\mathsf{dom}(\varGamma\_1) = \mathsf{dom}(\varGamma\_2) \land \forall c \in \mathsf{dom}(\varGamma\_1) : \varGamma\_1(c) \equiv \varGamma\_2(c)$$

Context Θ is non-linear and types process variables by tracking the types of their parameters. Context Γ is linear and allows variables to have basic or session types, and sessions with roles to have session-bufer types; as a program progresses, a role may simultaneously have both an active session type and messages queued in the message bufer.

Context composition allows two contexts to coexist as long as their common channels map to bufer types in one context, and session types in the other.

Context congruence holds if two contexts have the same domain and the types of their channels are congruent. It is key to note that by the defnitions of context composition and congruence we have s[p] : (M ; S) ≡ s[p] : M, s[p] : S. Bufer types (resp. session-bufer types) are only used internally by the type system; end-users are not expected to explicitly defne these types.

#### Defnition 9 (Context reduction). An action α is given as:

$$\alpha \quad ::= \begin{array}{l} s[\mathfrak{p}] \, ! \, \mathfrak{q} : \mathfrak{m}(\mathbb{T}) \quad | \quad s[\mathfrak{p}][\mathfrak{q}] : \mathfrak{m} \quad | \quad s[\mathfrak{p}] \, \mathbb{C} \end{array}$$

From left to right, this reads as (i) a sent message; (ii) communication of a message; and (iii) the timeout of a channel. Context transition <sup>α</sup>−→(Σ;R) is defned in fg. 5. We write Γ <sup>α</sup>−→(Σ;R) if ∃ Γ ′ : Γ <sup>α</sup>−→(Σ;R) <sup>Γ</sup> ′ . We defne two context reductions →(Σ;R) and →<sup>Σ</sup> as:

$$
\Gamma \to\_{\left(\Sigma; \mathcal{R}\right)} \Gamma' \text{ holds } \operatorname{iff} \Gamma \xrightarrow{\alpha}\_{\left(\Sigma; \mathcal{R}\right)} \Gamma'.
$$

$$
\Gamma \to\_{\Sigma} \Gamma' \text{ holds } \text{iff } \Gamma \xrightarrow{\alpha}\_{\Sigma} \Gamma' \text{ for } \alpha \in \{s[\mathfrak{p}] ! \mathfrak{q} : \mathfrak{m}(\mathbb{T}), \ s[\mathfrak{p}][\mathfrak{q}] : \mathfrak{m}\}
$$

We write →<sup>+</sup> (Σ;R) (resp. →<sup>+</sup> <sup>Σ</sup>) and →<sup>∗</sup> (Σ;R) (resp. →<sup>∗</sup> <sup>Σ</sup>) for their transitive and refexive/transitive closures respectively.

A context Γ keeps track of open bufers using a bufer-tracker Σ. Whenever a new session is initialised, it is added to Σ, details in § 3.4 item [T-ν]. For now it sufces to know that bufer trackers restrict communication to occur only over restricted sessions, thus by def. 5 (well-formedness), it guarantees that a session bufer exists for all sessions in Σ.

Context reduction (def. 9) models communication at the type-level. Context Γ can reduce by sending, communicating, or timing out. By [Γ-], Γ = s[p] : &i∈<sup>I</sup> {q<sup>i</sup> ? mi(Ti).S<sup>i</sup> , . S ′} can reduce to a timeout branch continuation type S ′ if s is in the bufer-tracker (i.e., a bufer exists for session s), and at least one


Fig. 5. Context reduction rules

of the roles in the branch is unreliable. The latter prevents taking a timeout for communication that is sure to be delivered. Reductions [Γ-Snd1] and [Γ-Snd2] simulate sending a message by reducing the selection type ⊕i∈<sup>I</sup> {q<sup>i</sup> ! mi(Ti).Si} to one of its continuations S<sup>i</sup> , and by inserting the sent message into the bufer type. The diference is that [Γ-Snd1] creates the bufer type if it was previously not specifed, whereas [Γ-Snd2] appends the message to an already existing bufer type. Communication between two roles is simulated through [Γ-Com], where a branch type s[q] : &i∈<sup>I</sup> {p<sup>i</sup> ? mi(Ti).S<sup>i</sup> [, . S ′ ]} consumes the message from a bufer type s[p] : q ! m(T) · M, reducing to the continuations s[p] : M, s[q] : Sk. Lastly, [Γ-µ] allows reduction through recursion and [Γ-Cong] reduces substructures of compatibly composed contexts.

#### Defnition 10. Property φ<sup>s</sup> is a (Σ; R)-safety property on typing contexts if:

[S-R1] φs(Γ, s[p] : &i∈<sup>I</sup> {q<sup>i</sup> ? mi(Ti).Si}) =⇒ ∀i ∈ I : q<sup>i</sup> ∈ R(p) [S-R2] φs(Γ, s[p] : &i∈<sup>I</sup> {q<sup>i</sup> ? mi(Ti).S<sup>i</sup> , . S ′}) =⇒ ∃i ∈ I : q<sup>i</sup> ̸∈ R(p) [S-Com] φs(Γ, s[p] : &i∈<sup>I</sup> {q<sup>i</sup> ? mi(Ti).S<sup>i</sup> [, . S ′ ]}, s[q]:M) and M ≡ p ! m(T) · M′ and ∃ k ∈ I : q<sup>k</sup> = q ∧ m<sup>k</sup> = m =⇒ T<sup>k</sup> = T [S-µ] φs(Γ, s[p] : µt.S) =⇒ φs(Γ, s[p] : S[ <sup>µ</sup>t.<sup>S</sup>/t]) [S-→] φs(Γ) and Γ →(Σ;R) Γ ′ =⇒ φs(Γ ′ )

As previously mentioned, our type system is a generic one that does not use syntactic methods of enforcing consistent communication. Therefore, we defne a safety property in def. 10 on type contexts that is used to guarantee subject reduction and other theorems (presented in § 4).

We say φ<sup>s</sup> is the largest safety property required to guarantee subject reduction. The property can be re-instantiated with more specifc conditions (as demonstrated in § 5) as per the requirements of the implementation. Concretely, [S-R1] and [S-R2] ensure that timeouts are only not defned if communication is reliable and that timeouts are defned if communication is unreliable respectively. Condition [S-Com] ensures that communicating messages have matching payload types. Lastly, [S-µ] preserves φ<sup>s</sup> through recursion unfolding and [S-→] requires safety to hold after context reduction.

#### 3.4 Typing Rules

Our type system is defned by the typing rules in fg. 6. Below we explain them in detail. Typing judgements are of the form: Θ · Γ ⊢ P reading "process P is well typed under type contexts Θ and Γ"; and Γ ⊢ d : T reading "value (or variable, or channel) d is of type T under type context Γ".


[T-0] end(Γ) Θ · Γ ⊢ 0 [T-Var] c : T ⊢ c : T [T-Val] v ∈ B ∅ ⊢ v : B [T-X] Θ(X) = T1, . . . , T<sup>n</sup> Θ ⊢ X : T1, . . . , T<sup>n</sup> [T-⊕] Γ<sup>1</sup> ⊢ c : ⊕i∈<sup>I</sup> {q<sup>i</sup> ! mi(Ti).Si} k ∈ I Γ<sup>2</sup> ⊢ d : T<sup>k</sup> Θ · Γ, c : S<sup>k</sup> ⊢ P Θ · Γ, Γ1, Γ<sup>2</sup> ⊢ c ⊕ [qk] ! mk⟨d⟩. P [T- &] Γ ′ ⊢ c : &i∈<sup>I</sup> {p<sup>i</sup> ? mi(Ti).S<sup>i</sup> [, . S ′ ]} [Θ · Γ, c : S ′ ⊢ Q] ∀i ∈ I · Θ · Γ, x<sup>i</sup> : Ti, c : S<sup>i</sup> ⊢ P<sup>i</sup> Θ · Γ, Γ ′ ⊢ c &i∈<sup>I</sup> {[pi] ? mi(xi).P<sup>i</sup> [, . Q]} [T-Call] Θ ⊢ X : T1, . . . , T<sup>n</sup> end(Γ ′ ) ∀i ∈ 1..n · Γ<sup>i</sup> ⊢ d<sup>i</sup> : T<sup>i</sup> Θ · Γ1, . . . , Γn, Γ ′ ⊢ X⟨d1, . . . , dn⟩ [T-Def] Θ, X : T1, . . . , T<sup>n</sup> · x<sup>1</sup> : T1, . . . , x<sup>n</sup> : T<sup>n</sup> ⊢ P Θ, X : T1, . . . , T<sup>n</sup> · Γ ⊢ Q Θ · Γ ⊢ def X(x<sup>1</sup> : T1, . . . , x<sup>n</sup> : Tn) = P in Q [T-+] Θ · Γ ⊢ P<sup>1</sup> Θ · Γ ⊢ P<sup>2</sup> Θ · Γ ⊢ P<sup>1</sup> + P<sup>2</sup> [T-Lift] Θ · Γ ⊢ P Θ · Γ ⊢<sup>∅</sup> P [T-ϵ] gc(Γ) Θ · Γ ⊢{s} s : ϵ [T-σ1] Θ · Γ ′ ⊢{s} s : σ Γ ⊢ w : T Θ · Γ, Γ ′ , s[p] : q ! m(T) · ϵ ⊢{s} s : (p ▷ q ◁ m⟨w⟩) · σ [T-σ2] Θ · Γ ′ , s[p] : M ⊢{s} s : σ Γ ⊢ w : T Θ · Γ, Γ ′ , s[p] : q ! m(T) · M ⊢{s} s : (p ▷ q ◁ m⟨w⟩) · σ [T-σw] Γ = (Γ<sup>0</sup> ⇝ Γ1), Γ<sup>2</sup> Θ · Γ<sup>1</sup> ⊢<sup>Σ</sup> s : σ gc(Γ0, Γ2) Θ · Γ ⊢<sup>Σ</sup> s : σ [T-|] Θ · Γ<sup>1</sup> ⊢Σ<sup>1</sup> P<sup>1</sup> Θ · Γ<sup>2</sup> ⊢Σ<sup>2</sup> P<sup>2</sup> Σ<sup>1</sup> ∩ Σ<sup>2</sup> = ∅ Θ · Γ1, Γ<sup>2</sup> ⊢Σ1∪Σ<sup>2</sup> P<sup>1</sup> | P<sup>2</sup> [T-ν] Γ ′ = {s[p] : τp}p∈<sup>ρ</sup> s ̸∈ Γ ({s} ; R)-φs(Γ ′ ) Θ · Γ, Γ ′ ⊢<sup>Σ</sup> P Θ · Γ ⊢Σ\{s} (νs:Γ ′ ) P

Fig. 6. Typing rules

end(si[p] : τ1, . . . , si[p] : τn)

$$\begin{array}{c} \begin{array}{l} \begin{array}{l} \forall i \in 1..n \ \mathsf{bases}(\mathsf{T}\_{i}) \ \forall \ \begin{array}{l} x\_{i}:\mathsf{T}\_{i}\vdash x\_{i}:\mathsf{end} \\ \mathsf{end} \end{array} \\\\ \mathsf{end}(\begin{array}{l} \mathsf{end}(\begin{array}{l} \mathsf{T}\_{1}) \ \mathsf{end} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \forall i \in \mathsf{T}\_{i}:\mathsf{T}\_{i}\vdash x\_{i}:\mathsf{end} \\ \mathsf{end} \end{array} \end{array} \end{array} \begin{array}{l} \begin{array}{l} \begin{array}{l} x\_{i}:\mathsf{T}\_{1}\vdash x\_{i}:\mathsf{end} \\ \mathsf{end} \end{array} \end{array} \end{array}$$

end(Γ1, Γ2)

Fig. 7. Predicate end(Γ)

$$\begin{array}{cc} \mathbf{gc}(\emptyset) & \mathbf{gc}(\Gamma) & \mathbf{bc} \mathbf{a} \mathbf{c}(\Gamma) & \mathbf{gc}(\Gamma, s[\mathbf{p}]:\mathbb{M}) \\\\ \hline \mathbf{gc}(\Gamma, s[\mathbf{p}]:\epsilon) & & & \mathbf{gc}(\Gamma, s[\mathbf{p}]:\mathbf{q} \, \! \! \! / \! \! \! / \! \! \! / \! \! \! / \! \! \! \! / \! \! \! \! \mathbf{M}) \\\\ & \mathbf{\varprojlim} \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \, \mathbf{\upint} \end{array}$$

Fig. 8. The garbage collector predicate gc(Γ)

s[p] : q ! m(T) · ϵ ⇝ Γ, s[p] : M = Γ, s[p] : q ! m(T) · M s[p] : q ! m(T) · ϵ ⇝ Γ when s[p] : M ̸∈ Γ = Γ, s[p] : q ! m(T) · ϵ

> Fig. 9. Message insertion function Γ ′⇝Γ


Example 4 (Ping Pong: Type Context). Recalling the ping pong example, the whole system can then be described by a parallel composition of the three processes representing each role p, q, r together with an empty bufer, which is closed under a type context Γ with the following typing assumptions.

$$\begin{array}{l}\Gamma = \{s[\mathsf{p}] : \mathbb{S}\_{\mathsf{p}}, \ s[\mathsf{q}] : \mathbb{S}\_{\mathsf{q}}, s[\mathsf{r}] : \mathbb{S}\_{\mathsf{r}}\} \\\\ P\_{ping} = \{\mathsf{v}s : \Gamma\} \ |\: P\_{\mathsf{p}} \mid \: P\_{\mathsf{q}} \mid \: P\_{\mathsf{r}} \mid \: s : \mathsf{c} \end{array}$$

# 4 Type Properties

The main results of our MPST system for MAGπ processes are subject reduction (theorem 1) and session fdelity (theorem 2). It is key to note that our results are parametric on the reliability function R. Thus, the theorems we present hold for any confguration of reliability, i.e., from no reliable communication all the way to completely reliable networks.

In order to synchronise reliability assumptions between types and processes, we defne the reliable process reduction −→R, such that −→<sup>R</sup> ⊆ −→.

Defnition 11 (Reliable process reduction). The reliable process reduction −→<sup>R</sup> is inductively defned by the same reduction rules for −→ (in fg. 2), with the following changes <sup>3</sup> :

$$\{\mathsf{R}\mathsf{-}\mathbb{O}\}\quad s[\mathsf{q}]\,\&\_{i\in I}\{ [\mathsf{p}\_{i}] \,?\,\mathsf{m}\_{i}(x\_{i}).P\_{i}, \mathbb{C}.Q\}\,\big|\,s:\sigma\longrightarrow\mathsf{q}\,\,Q\,\mid s:\sigma\qquad\text{if }\exists k\in I:\mathsf{p}\_{k}\notin\mathsf{\mathcal{R}}(\mathsf{q})$$

$$\begin{array}{c} \mathbf{[R-\downarrow]} \end{array} \qquad\qquad s: (\mathfrak{p} \rhd \mathfrak{q} \lhd \mathfrak{m} \langle w \rangle) \cdot \sigma \longrightarrow\_{\mathscr{R}} s: \sigma \qquad\qquad\qquad\qquad\text{for } \mathfrak{q} \notin \mathscr{R} \mathfrak{p} \langle \mathfrak{p} \rangle$$

<sup>3</sup> For a fully unreliable network, i.e., ∀p ∈ ρ · R(p) = ∅, −→<sup>R</sup> is equivalent to −→.

Intuitively, the reliable process reduction disregards network faults for reliable communication. Concretely, a timeout reduction [R-] is only possible if at least one role in the branch is unreliable; and message loss [R-↓] can only occur for messages that are not reliable from the viewpoint of the sender. This ensures that no messages are ignored or lost for reliable communication. Proofs of our theorems, along with any auxiliary results, are given in the technical report [23].

#### 4.1 Subject Reduction

Using −→R, we now present our result of subject reduction. Intuitively, subject reduction states that, if a process P is typed under a safe context, and P reliably reduces to some process P ′ , then the context also reduces (in 0 or 1 steps) to a safe context, which types the new process P ′ .

#### Theorem 1 (Subject Reduction).

$$\begin{aligned} \left( \Theta \cdot \Gamma \vdash\_{\Sigma} P \quad \text{and} \quad (\Sigma; \mathcal{R}) \cdot \varphi\_{\mathfrak{s}}(\varGamma) \quad \text{and} \quad P \to\_{\mathcal{R}} P' \implies \\ \exists \varGamma' : \Gamma \to\_{\{\Sigma; \mathcal{R}\}}^{\{0,1\}} \varGamma' \quad \text{and} \quad (\Sigma; \mathcal{R}) \cdot \varphi\_{\mathfrak{s}}(\varGamma') \quad \text{and} \quad \Theta \cdot \Gamma' \vdash\_{\Sigma} P' \end{aligned}$$

A key novel result of our type system is that no unexpected network failures can occur at runtime, i.e., a process always has a failure-handling subprotocol defned for unreliable communication. This follows from the defnition of our safety property φ<sup>s</sup> (def. 10) and holds through subject reduction. We state the result in cor. 1. More precisely, this corollary states that timeout branches are guaranteed to be defned for unreliable communication. The inverse is stated in cor. 2, i.e., timeouts are not defned for branches containing only reliable sources.

Corollary 1 (Failure handling safety). Given a reliability function R : p ̸∈ R(q) and Θ · Γ ⊢<sup>Σ</sup> P with (Σ; R)-φs(Γ) and P −→<sup>∗</sup> <sup>R</sup> P ′ ≡ C[Q] implies Q ̸= s[q] &i∈<sup>I</sup> {. . . , [p] ? m(x).Q′}. I.e., Q cannot be a branch at q receiving from p and not defne a timeout.

Corollary 2 (Reliability adherence). Given a reliability function R : R(q) = R<sup>q</sup> and Θ · Γ ⊢<sup>Σ</sup> P with (Σ; R)-φs(Γ) and P −→<sup>∗</sup> <sup>R</sup> P ′ ≡ C[Q] implies Q ̸= s[q] &i∈<sup>I</sup> {[p<sup>i</sup> ] ? mi(xi).Q<sup>i</sup> , . Q′} st: ∀i ∈ I : p<sup>i</sup> ∈ Rq. I.e., Q cannot be a branch at q only receiving from reliable roles p<sup>i</sup> and defne a timeout.

#### 4.2 Session Fidelity

Session fdelity states the opposite implication of subject reduction, i.e., if Γ types a process P, and Γ can reduce, then P can match at least one of the context reductions.

Consequently, relevant properties of process P can be deduced from the behaviour of its type context Γ (as we will see in theorem 3). However, as shown by Scalas and Yoshida [33, sec. 5.2], the result does not hold for all well-typed processes. Concretely, session fdelity is violated by: (i) processes that recurse infnitely without being productive (e.g. def X(x) = X⟨x⟩ in X⟨s[p]⟩); and (ii) processes that deadlock by interleaving communication across multiparty sessions. Hence, we assume the necessary conditions on processes to restrict the aforementioned violations, by adapting [33, def. 5.3].

Defnition 12 (Conditions for session fdelity). Assuming ∅ · Γ ⊢{s} P. We say that P:

1. has guarded defnitions if each process defnition in P of the form

def X(x<sup>1</sup> : T, . . . , x<sup>n</sup> : T) = Q in P ′

∀j ∈ 1..n : if T<sup>j</sup> is a session type, then a process call Y ⟨. . . , x<sup>j</sup> , . . .⟩ can only occur in Q as a subterm of

> x<sup>j</sup> &i∈<sup>I</sup> {[p<sup>i</sup> ] ? mi(yi).P<sup>i</sup> [, . Pt]} or x<sup>j</sup> ⊕ [p] ! m⟨y⟩. P′′ ,

i.e., after x<sup>j</sup> is used for input or output.

2. only plays role p in s, by Γ if: (i) P has guarded defnitions (from 1); (ii) fv(P) = ∅; (iii) Γ = Γ0, s[p] : τ with τ ̸= end and end(Γ0); and (iv) for all (νs′ :Γ ′ ) P ′ subterm of P, end(Γ ′ ).

We say "P only plays role p in s" if ∃Γ : ∅ · Γ ⊢{s} P and condition 2 holds.

Def. 12 formalises guarded recursion in condition 1, and the notion of only playing a single role for a given session in condition 2. Together, these conditions ensure that session fdelity, stated in theorem 2, holds for all well-typed processes.

Theorem 2 (Session Fidelity). Assuming ∅ · Γ ⊢<sup>Σ</sup> P with (Σ; R)-φs(Γ), P ≡ (Πp∈<sup>I</sup> Pp)| s : σ and Γ = S <sup>p</sup>∈<sup>I</sup> Γp, and for each Pp: (i) ∅ · Γ<sup>p</sup> ⊢<sup>Σ</sup> Pp, and (ii) P<sup>p</sup> being 0 (up-to-≡) or only plays role p in s, by Γp. Then,

Γ −→(Σ;R) implies ∃Γ ′ , P′ : (i) Γ −→(Σ;R) Γ ′ , (ii) P −→<sup>+</sup> <sup>R</sup> P ′ , (iii) ∅·Γ ′ ⊢<sup>Σ</sup> P ′ with (Σ; R)-φs(Γ ′ ), (iv) P ′ = (Πp∈<sup>I</sup> P ′ p )| s : σ ′ and Γ ′ = S <sup>p</sup>∈<sup>I</sup> Γ ′ p , and (v) for each P ′ p : ∅ · Γ ′ <sup>p</sup> ⊢<sup>Σ</sup> P ′ p , and P ′ p is 0 (up-to-≡) or only plays role p in s, by Γ ′ p .

#### 4.3 Process Properties

Our result of session fdelity (§ 4.2) allows us to infer runtime properties about programs in MAGπ from their types. We proceed by defning desirable runtime properties on processes (def. 13); expressing the equivalence of these properties at type-level (def. 14); and presenting our result of process properties verifcation (theorem 3), linking process properties to their type-level equivalences.

From def. 13 below, a process is: (i) R<sup>F</sup> -communication-safe (new w.r.t.[33]) if it reaches the end of communication over reliable reductions and has no leftover messages in the bufer; (ii) deadlock-free if it either reduces or it is inaction; (iii) terminating if it is deadlock free and can reach inaction in a fnite number of steps; (iv) never-terminating if it can always infnitely reduce; and (v) live if, for every reliable branch it can reduce to, it can eventually reduce to some branch continuation. We need not consider branches with timeouts since these are trivially live, given that a process can always reduce over the timeout.

Defnition 13 (Process properties, adapted from [33]). For some reliability function R, and full reliability function R<sup>F</sup> , a process P is said to be:

(i) R<sup>F</sup> -communication-safe if

$$P \longrightarrow \stackrel{\*}{\mathcal{R}}\_{\mathcal{R}\_F} P' \longleftrightarrow \underset{\mathcal{R}\_F}{\text{and}} \quad \text{and} \quad P' = \mathbb{C}[s:\sigma] \quad implies \ \sigma = \epsilon;$$

(ii) deadlock-free if P −→<sup>∗</sup> <sup>R</sup> P ′ ̸−→<sup>R</sup> implies P ′ ≡ 0;

(iii) terminating if it is deadlock free, and

∃ i fnite st: ∀n ≥ i : P = P<sup>0</sup> −→<sup>R</sup> P<sup>1</sup> −→<sup>R</sup> · · · −→<sup>R</sup> P<sup>n</sup> implies P<sup>n</sup> ̸−→<sup>R</sup> ;

(iv) never-terminating if P −→<sup>∗</sup> <sup>R</sup> P ′ implies P ′ −→R; (v) live if P −→<sup>∗</sup> <sup>R</sup> P ′ ≡ C[Q] implies

$$\begin{aligned} if \ Q = c \&\_{i \in I} \{ [\mathfrak{q}\_i] \, ? \, \mathfrak{m}\_i(x\_i). Q'\_i \}, \ then\\ \exists \mathbb{C}', k \in I, w: P' \longrightarrow \prescript{\*}{}{\mathcal{R}'} \mathbb{C}'[Q'\_k[\, ^{w}/\_{x\_k}]]. \end{aligned}$$

Note that, diferently from other works [4,33], our defnition of liveness only speaks about receiving processes, and not sending. Typically, liveness also requires that a sent message—in the case of MAGπ, any message in a session bufer—is always eventually consumed. However, because of the failures that our calculus models, it is possible that a process is live and still have unconsumed messages in the bufer (e.g., as a result of timing out due to a message delay). Additionally, for a R<sup>F</sup> -communication-safe process it follows that all sent messages are consumed in the reliable case. Hence, the traditional defnition of liveness still holds for reliable network confgurations, and our new defnition provides the largest guarantees possible given the failure assumptions.

We now present the type-level equivalences of the above process properties. For liveness, we generalise to the largest liveness property, as done with safety in def. 10, allowing users to defne more fne-grained notions of liveness, if required.

From def. 14 below, a type context is: (i) R<sup>F</sup> -communication-safe if it has no populated bufer types when it can no longer reliably reduce; (ii) deadlockfree if the reason why it can no longer reduce is because it is end typed (and possibly, as a result of network failures, has some leftover types that can be garbage collected); (iii) terminating if it is deadlock free and can reach the end of the protocol in a fnite number of steps; (iv) never-terminating if it can always infnitely reduce; and (v) live if, for every reliable branch it can reduce to, there is a series of steps that can reduce to a continuation of that branch.

Defnition 14 (Type context properties). For some reliability function R, a full reliability function R<sup>F</sup> , and a set of sessions Σ, we say context Γ is:

(i) (Σ; R<sup>F</sup> )-communication-safe if

$$
\Gamma \longrightarrow \mathop{\ast}^\*\_{\left(\Sigma; \mathcal{R}\_F\right)} \Gamma' \longleftrightarrow \mathop{\ }{\left(\Sigma; \mathcal{R}\_F\right)} \text{ and } \mathop{s[\mathbf{p}]} : \mathbb{M} \in \Gamma' \ implies \ \mathbb{M} = \epsilon;
$$

(ii) (Σ; R)-deadlock-free if

$$
\Gamma \longrightarrow \mathop{\ast}^\*\_{\left(\Sigma; \mathfrak{R}\right)} \Gamma' \longleftrightarrow \mathop{\ }{\left(\Sigma; \mathfrak{R}\right)} \quad implies \quad \Gamma' = \Gamma'\_0, \Gamma'' \text{ st: } \mathbf{end}(\Gamma'\_0) \text{ and } \mathbf{gc}(\Gamma'');
$$

(iii) (Σ; R)-terminating if it is (Σ; R)-deadlock-free, and ∃ i fnite st:

$$\forall n \ge i: I = \varGamma\_0 \longrightarrow\_{\left(\Sigma; \mathfrak{R}\right)} \Gamma\_1 \longrightarrow\_{\left(\Sigma; \mathfrak{R}\right)} \dots \longrightarrow\_{\left(\Sigma; \mathfrak{R}\right)} \Gamma\_n \ implies \Gamma\_n \xleftarrow{}\_{\left(\Sigma; \mathfrak{R}\right)} ;$$

(iv) (Σ; R)-never-terminating if Γ −→<sup>∗</sup> (Σ;R) Γ ′ implies Γ ′ −→(Σ;R) ;

(v) (Σ; R)-live if it obeys some liveness property (Σ; R)-φ<sup>L</sup> st:

$$\begin{array}{c} (\Sigma; \mathfrak{R}) \cdot \varphi\_{\mathsf{L}}(\varGamma, \, s[\mathfrak{p}]: \mathfrak{S}) \, and \, \mathfrak{S} = \&\_{i \in I} \{\mathsf{q}\_{i} ? \, \mathsf{m}\_{i}(\varGamma\_{i}).\mathrm{\mathbb{S}}\_{i}\} \\ \implies \, \exists \varGamma', \, k \in I: \Gamma, s[\mathfrak{p}]: \mathbb{S} \longrightarrow^{\*}\_{(\varSigma; \mathfrak{R})} \, \Gamma', s[\mathfrak{p}]: \mathfrak{S}\_{k} \\ (\Sigma; \mathfrak{R}) \cdot \varphi\_{\mathsf{L}}(\varGamma, \, s[\mathfrak{p}]: \mu \mathsf{t}. \mathrm{\mathbb{S}}) \implies \, (\Sigma; \mathfrak{R}) \cdot \varphi\_{\mathsf{L}}(\varGamma, \, s[\mathfrak{p}]: \mathfrak{S}[\mu^{\mathsf{t}, \mathsf{S}}/\_{\mathsf{t}}]) \\ (\Sigma; \mathfrak{R}) \cdot \varphi\_{\mathsf{L}}(\varGamma) \, \, and \, \Gamma \to\_{(\varSigma; \mathfrak{R})} \, \Gamma' \implies \, (\Sigma; \mathfrak{R}) \cdot \varphi\_{\mathsf{L}}(\varGamma) \end{array}$$

We are now ready to use these type-level equivalent properties to infer behaviours of the processes they type. We present our result in theorem 3 which formally states that, under the same assumptions given in session fdelity (theorem 2), if a process is typed under some type context, and a property holds on that context, then the same property holds for the process itself.

Theorem 3 (Process properties verifcation). Assuming: ∅ · Γ ⊢<sup>Σ</sup> P with (Σ; R)-φs(Γ), P ≡ (Πp∈<sup>I</sup> Pp)| s : σ and Γ = S <sup>p</sup>∈<sup>I</sup> Γp. Further, for each Pp: (i) ∅ · Γ<sup>p</sup> ⊢<sup>Σ</sup> Pp, and (ii) P<sup>p</sup> ≡ 0 or P<sup>p</sup> only plays role p in s, by Γp. Then, ∀ϕ ∈ {R<sup>F</sup> -communication-safe, deadlock-free, terminating, never-terminating, live}, if (Σ; R)-ϕ(Γ), then P is ϕ.

#### 4.4 Decidability

Since MAGπ is Turing-complete, determining the properties listed in def. 13 from processes is undecidable [5]. A beneft of our generalised theory is that undecidable process properties can be inferred from decidable type-level properties.

Theorem 4 (Decidability). If (Σ; R)-ϕ(Γ) is decidable, then "Θ · Γ ⊢<sup>Σ</sup> P with (Σ; R)-ϕ(Γ)" is decidable.

Our decidability result (theorem 4) states that for any decidable type-level property, type-checking with that property is decidable. However, since MAGπ is asynchronous, we have no results on decidability of ϕ. On the contrary, as discussed in [33, sec. 7], type-level properties for asynchronous type theories are, in some cases, undecidable. This is a result of pairing bufer types with session types—which makes the type system Turing-powerful [3, thm. 2.5]. Scalas and Yoshida [33] address this issue through two methods: (i) standard global types produce type contexts that can be captured through a decidable consistency property; and (ii) restricting the size of the message bufer to make properties decidable. The former ensures decidability by restricting communication to match the expressivity of global types. For the latter, they show that any type context that remains bound within a fnite-sized bufer is decidable (since the type has a fnite state transition system representation). In line with their results, we lift their defnition of boundedness, i.e., a restriction on the size of a bufer, to MAGπ's type system.

Defnition 15 (Boundedness, from [33]). We say Γ is (Σ; R)-bound<sup>k</sup> if ∃k ∈ N : Γ −→<sup>∗</sup> (Σ;R) Γ ′ , s[p] : M implies |M| < k. We say Γ is (Σ; R)-bounded if ∃k fnite : (Σ; R)-boundk(Γ).

Using def. 15, we present our result of decidable bounded properties in theorem 5.

Theorem 5 (Decidable bounded properties). (Σ; R)-boundk(Γ) is decidable for all Σ, R, and k. Furthermore, if (Σ; R)-bounded(Γ), then ∀ϕ ∈ {R<sup>F</sup> -communication-safe, deadlock-free, terminating, never-terminating, live}, it holds that (Σ; R)-ϕ(Γ) is decidable.

Thus, decidability is guaranteed for all protocols expressible through standard asynchronous global type theory, and all protocols that use fnite message bufers—now with the beneft of reasoning about and handling network errors!

Example 5 (Ping Pong: Properties). Inspecting the types in example 1 and example 4, we can conclude that Γ = {s[p] : Sp, s[q] : Sq, s[r] : Sr} is bound4. By theorem 5, Γ is decidable to check for type-level properties. On doing so, we determine that Γ is: (i) safe, it satisfes the safety property (def. 10) required for subject reduction; (ii) R<sup>F</sup> -communication-safe, since if we only consider reliable reductions, no bufer types remain populated; (iii) terminating, since we can count the number of steps taken to reach the end of the protocol; and (iv) live, as reliable communication S<sup>r</sup> always reduces—i.e., a result is always obtained.

# 5 Generalising Network Assumptions

The work presented thus far covers worst-case network assumptions for communication. As benefcial as this may be for low-level networks programming, and for complex distributed applications with minimal assumptions (e.g. consensus protocols), not all applications are built on these pessimistic conditions. E.g. many distributed applications operate over the Transmission Control Protocol (TCP), and thus assume that if consecutive messages are received from the same source, then they are guaranteed to arrive in the order in which they were sent.

We now showcase the few changes to MAGπ required to alter its network assumptions. It is key to note that these changes produce a subset of MAGπ, thus all relevant properties continue to be valid for its TCP-compliant version.

#### 5.1 From Total to Partial Reordering

In a reliable network confguration designed to run over TCP, message reordering for communication between two parties is guaranteed to not occur. Therefore, we can adjust the message reordering of MAGπ to model this environment, and strengthen our safety property φ<sup>s</sup> to TCP-safe communication. MAGπ models message reordering through bufer congruence rules. Therefore, strengthening congruence sufces to restrict communication to the TCP-safe assumptions.

Defnition 16 (TCP process-congruence). The process congruence for the TCP-compliant subset of MAGπ, ≡TCP, is inductively defned using the same rules defning ≡ (in def. 3), but with the following change:

$$s \colon \sigma\_1 \cdot h\_1 \cdot h\_2 \cdot \sigma\_2 \equiv s \colon \sigma\_1 \cdot h\_2 \cdot h\_1 \cdot \sigma\_2$$

$$\text{repplaced by}$$

$$\begin{array}{c} \mathsf{p}\_1 \neq \mathsf{p}\_2 \text{ or } \mathsf{q}\_1 \neq \mathsf{q}\_2\\\hline s \colon \sigma\_1 \cdot (\mathsf{p}\_1 \rhd \mathsf{q}\_1 \lhd \mathsf{m}\_1 \langle w\_1 \rangle) \cdot (\mathsf{p}\_2 \rhd \mathsf{q}\_2 \lhd \mathsf{m}\_2 \langle w\_2 \rangle) \cdot \sigma\_2\\\hline \equiv\_{\mathsf{TCP}} s \colon \sigma\_1 \cdot (\mathsf{p}\_2 \rhd \mathsf{q}\_2 \lhd \mathsf{m}\_2 \langle w\_2 \rangle) \cdot (\mathsf{p}\_1 \rhd \mathsf{q}\_1 \lhd \mathsf{m}\_1 \langle w\_1 \rangle) \cdot \sigma\_2 \end{array}$$

To obtain the TCP-compliant subset of MAGπ, we assume reductions over fully reliable networks and adopt TCP process congruence from def. 16, which no longer allows reordering of messages for each role couple. We now refect this defnition of TCP congruence at the type-level in def. 17, and use this to defne a TCP-safety property on type contexts in def. 18.

Defnition 17 (TCP type-congruence). The type congruence for the TCPcompliant subset of MAGπ, ≡TCP, is inductively defned using the same rules as ≡ (fg. 4), but with the following change:

$$\begin{array}{ccc} \hline \mathbf{M}\_1 \cdot \mathbf{M}\_2 \equiv \mathbf{M}\_2 \cdot \mathbf{M}\_1 & \text{repplaced by} & \begin{array}{c} \mathbf{p} \neq \mathbf{q} \\ \hline \mathbf{p} ! \, \mathbf{m}\_1(\mathbb{T}\_1) \cdot \mathbf{q} ! \, \mathbf{m}\_2(\mathbb{T}\_2) \cdot \mathbf{M} \\ \hline \equiv \mathbf{r} \mathbf{c} \mathbf{p} ! \, \mathbf{m}\_2(\mathbb{T}\_2) \cdot \mathbf{p} ! \, \mathbf{m}\_1(\mathbb{T}\_1) \cdot \mathbf{M} \\ \end{array} \\ \hline \end{array}$$

Defnition 18 (TCP safety). Predicate φTCP is a Σ-TCP-safety property on typing contexts if:

$$\begin{array}{c} \varphi\_{\mathsf{TCP}}(\varGamma, \,\,s[\mathsf{p}]: \,\&\,\,\&\_{i \in I}\{\mathsf{q}\_{i} : \mathsf{m}\_{i}(\mathsf{T}\_{i}).\\$\_{i}\}, \,\,s[\mathsf{q}]: \mathsf{M})\\ \text{and } \mathsf{M} \equiv\_{\mathsf{TCP}} \mathsf{p} \,\!\!/ \mathsf{m}(\mathsf{T}) \cdot \mathsf{M}'\\ \text{and } \exists \, k \in I: \,\mathbf{q}\_{k} = \mathsf{q} \implies \mathsf{m}\_{k} = \mathsf{m} \wedge \mathsf{T}\_{k} = \mathsf{T} \\ \varphi\_{\mathsf{TCP}}(\varGamma, \,\,s[\mathsf{p}]: \,\mu \mathsf{t}.\\$) \quad \implies \,\,\varphi\_{\mathsf{TCP}}(\varGamma, \,\,s[\mathsf{p}]: \,\mathbb{S}[\mathsf{p}^{\mathsf{t} \mathsf{t} .\\$/\mathsf{t}]}) \\ \varphi\_{\mathsf{TCP}}(\varGamma) \text{ and } \,\Gamma \to\_{\Sigma} \,\,\,\Gamma' \quad \implies \,\,\varphi\_{\mathsf{TCP}}(\varGamma') \end{array}$$

Similar to our previous defnition of safety in def. 10, TCP safety ensures that payload types of communicating entities match. In addition, it also requires correct ordering of messages (up to ≡TCP) by checking message labels—this is possible since messages between two parties do not get reordered, and so they must be received in the same order they are sent. In order to beneft from the session theorems proved in § 4, all that is required is to show that φTCP ⊆ φs, i.e., any context that is TCP-safe is also safe. This is the only requirement since all theorems in § 4 (i) are parametric on the reliability function R, including fully reliable networks; and (ii) are proven for (Σ; R)-φs(Γ).

# Proposition 1 (Containment of φTCP in φs). ∀Γ ∈ φTCP : Γ ∈ φs.

Proof. φTCP uses a fully reliable confguration of MAGπ—i.e., is void of failurehandling timeouts—and thus trivially abides by [S-R1] and [S-R2]. [S-µ] is refected directly in φTCP. [S-→] is refected for R = R<sup>F</sup> , i.e., for a fully reliable confguration. [S-Com] is never violated by Γ ∈ φTCP since ≡TCP ⊂ ≡. ⊓⊔

# 6 Case Study

This work presents the Ping (examples 1–5) and Domain Name System (§ 6.1) examples as they are widely known, and between them cover the full range of our contributions. Previous related works are not expressive enough to model either protocol with our range of failure assumptions. Thus Ping and DNS are suitable to illustrate how MAGπ pushes the boundaries of MPST. Additional examples are provided in the technical report [23].

#### 6.1 DNS

We now demonstrate the key features of MAGπ through a case study. We present a multiparty example of a Domain Name System (DNS) with a cache and inbuilt load-balancer. This example: (i) reasons about failures in its unreliable connections that are specifed using our novel viewpoint-specifc reliability sets; (ii) defnes failure-handling protocols for these possible failures; (iii) is bounded (def. 15), and thus has decidable type-level properties; and (iv) is safe, R<sup>F</sup> -communication-safe, deadlock-free, terminating, and live. Typically, DNS is implemented over TCP, however the distributed components can still sufer hardware failures. To cater for this, and for better demonstration of our contributions, we describe the protocol in our failure-prone setting.

Specifcation We consider a specifcation of a client-DNS interaction, where the client consults a cache, and the DNS delegates requests to workers.

The client, represented by role c, wishes to retrieve a web-address for a particular URL, and can do so by issuing a request to the DNS. As an optimisation, the client also stores recently retrieved addresses in a local and reliable cache thus before issuing new requests to the DNS, it frst consults this cache. Upon receiving a request, the DNS ofoads processing work to one of two workers, represented by roles w<sup>1</sup> and w2. After retrieving the appropriate address, the worker sends the response to the client.

The reliability confguration of this application is as such: the client and cache have reliable connections, formally R(c) = {cache} and R(cache) = {c}; the DNS and workers have reliable connections, formally R(DNS) = {w1, w2} and R(w1) = R(w2) = {DNS}; all other communications are unreliable.

We now present the session types specifying the communication protocol for this distributed application. We adopt shorthand notion for singleton selections, and omit payload types for simplicity, as with the ping example.

Example 6 (DNS protocol).

S<sup>c</sup> = cache !req(). & cache ? ans().end, cache ? 404().DNS !req(). & w<sup>1</sup> ? ans().cache ! new().end, w<sup>2</sup> ? ans().cache ! new().end, . cache ! ko().end Scache = & c ? req(). ⊕ c ! ans().end, c ! 404(). & c ? new().end, c ? ko().end SDNS = & c ? req(). ⊕ w<sup>1</sup> !req().w<sup>2</sup> ! ko().end w<sup>2</sup> !req().w<sup>1</sup> ! ko().end . w<sup>1</sup> ! ko().w<sup>2</sup> ! ko().end <sup>S</sup><sup>w</sup><sup>i</sup> = & DNS ? req().c ! ans().end, DNS ? ko().end

Our viewpoint-specifc defnition of reliability is necessary to specify the reliable connections with the DNS and workers whilst maintaining unreliable connections with the client. Additionally, the client type S<sup>c</sup> (resp. the DNS type SDNS) is dependant on using undirected branching (resp. selection). Hence this example is not expressible using previous theory [4,33].

# 7 Related Work, Conclusions and Future Work

Modelling failures has become a relevant and widely researched topic in recent years. We elaborate on how our generic type system and modular language difers from, and in some cases may possibly subsume, related work.

Majumdar et al. [24] introduce undirected branching as a means of catering for the non-deterministic partial reordering of messages that is possible in networks using the Transmission Control Protocol (TCP). As shown in § 5, the modularity of our type system allows MAGπ to be adapted to support this network confguration, as well as other settings with lower levels of abstraction.

Afne type systems defne types that can be used at most once. Afne session types [25,12,6] use afne typing metatheory to allow sessions to be prematurely cancelled in the event of failure. These works only model application-level failure (using try/catch blocks) and do not necessarily describe how a failure is handled, but only allow the initial protocol to be abandoned if failure occurs.

Viering et al. [38] present a MPST theory for event-driven distributed systems, where processes are restarted by monitors if they crash. This approach requires a centralised reliable node, a notion that is subsumed by our view-point specifc defnition of reliability, def. 7.

Chen et al. [8] remove the need for a centralised reliable node. They equip their type system with synchronisation points capable of detecting and handling failures raised by the nodes that experience them. Similarly, Adameit et al. [1] consider an environment free from a centralised reliable node where unstable links between participants can fail. They introduce the concept of optional blocks, allowing default values to substitute data not received due to communication failure. Viering et al. [37], motivated by consensus algorithms, delegate a group of processes as a permanently available recovery system capable of monitoring processes and informing them of failures. Thus, they no longer rely on one centralised robust node, but instead assume that at least some of the processes that make up the coordinator are alive at any given time. The drawback in these approaches is their reliance on coordination to handle faults. This may not be suitable with certain network confgurations and failure-models. Since our type system handles failure through low-level techniques, it remains agnostic to the types of failures, and is suitable for any non-Byzantine network confguration.

Recent work by Peters et al. [28] extends global type theory with failure annotations—marking communication susceptible to failures and the kind of failure (specifcally either process crashes or message loss). They handle failure by defning default values and branches. Since the theory is an extension of global types, it sufers from the same problems that are addressed through generalised MPST. Additionally, the work is not agnostic to failure-models, and so it is uncertain if the theory is capable of model failures other than the two considered.

Most similar to MAGπ is work by Barwell et al. [4], where generalised session type theory is extended to reason about crash-stop failures. They reserve the crash message label, which can be used in receive branches to detect node failure and specify failure-handling subprotocols. In line with our research, their type system is generic, thus improving its expressiveness. However, unlike MAGπ, their theory is not asynchronous, does not support undirected branching/selection, and assumes crash-stops to be the only possible faults—we address and capture a range of failures such as crash failures, link failures, message loss, delays and reordering and network partitioning.

Distributed variations of the π-calculus [2,30,7,13] introduce process locations—representations of real-world physical hardware. Processes are assigned to locations to form a topology, and locations can be crashed to model failures. None of these calculi model the range of failures that are supported by MAGπ, nor do they have type systems to ensure communication-safe failure handling.

To conclude the paper, we presented MAGπ—a Multiparty, Asynchronous and Generalised π-calculus which addresses the widest set of non-Byzantine faults by using timeouts and the most general reliability defnition. Our language builds on the generalised and asynchronous MPST, which is the most fexible for distributed programming. We prove subject reduction and session fdelity; a series of process properties, as well as fault-handling safety and reliability adherence. As future work, we aim to investigate linear logic for Curry-Howard correspondences in order to understand the foundational and canonical meaning of faults and reliability. We aim to investigate Byzantine faults in combination with the non-Byzantine faults addressed here. Lastly, we will explore the use of model checking to streamline the verifcation of process properties.

Acknowledgements. We thank the anonymous reviewers and give a special thanks to Simon Fowler for his invaluable support and feedback.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### System F µ <sup>ω</sup> with Context-free Session Types?

Diogo Po¸cas(B) , Diana Costa , and Andreia Mordido , and Vasco T. Vasconcelos

LASIGE, Faculdade de Ciˆencias, Universidade de Lisboa, Lisbon, Portugal {dmpocas,dfdcosta,afmordido,vmvasconcelos}@ciencias.ulisboa.pt

Abstract. We study increasingly expressive type systems, from F <sup>µ</sup>—an extension of the polymorphic lambda calculus with equirecursive types to F µ; <sup>ω</sup> —the higher-order polymorphic lambda calculus with equirecursive types and context-free session types. Type equivalence is given by a standard bisimulation defined over a novel labelled transition system for types. Our system subsumes the contractive fragment of F µ <sup>ω</sup> as studied in the literature. Decidability results for type equivalence of the various type languages are obtained from the translation of types into objects of an appropriate computational model: finite-state automata, simple grammars and deterministic pushdown automata. We show that type equivalence is decidable for a significant fragment of the type language. We further propose a message-passing, concurrent functional language equipped with the expressive type language and show that it enjoys preservation and absence of runtime errors for typable processes.

Keywords: System F, Higher-order kinds, Context-free session types

# 1 Introduction

Extensions of the λ-calculus to include increasingly sophisticated type structures have been extensively studied and have led to systems whose importance is widely recognized: System F [60], System F <sup>µ</sup> [30], System F<sup>ω</sup> [36], System F µ <sup>ω</sup> [14]. Ideally, we would like to combine a wishlist of type structures and get a super-powerful system with vast expressiveness. However, the expressiveness of types is naturally limited by the universe where they are supposed to live: programming languages. Expressive type systems pose challenges to compilers that other (less expressive) types do not even reveal; one such example is type equivalence checking.

System F can be enriched with different type constructors for specifying communication protocols. We analyse the impact of combinations of such constructors on the type equivalence problem. In order to do so, we extend System F with session types [42,43,67]. Session types provide for detailed protocol specifications in the form of types. Traditional recursive session types are limited to tail

<sup>?</sup> Support for this research was provided by the Funda¸c˜ao para a Ciˆencia e a Tecnologia through project SafeSessions, ref. PTDC/CCI-COM/6453/2020, and by the LASIGE Research Unit, ref. UIDB/00408/2020 and ref. UIDP/00408/2020. A full version is available on arXiv [20].

T. Wies (Ed.): ESOP 2023, LNCS 13990, pp. 392–420, 2023. https://doi.org/10.1007/978-3-031-30044-8 15

recursion, thus failing to capture all protocols whose traces cannot be characterized by regular languages. Context-free session types overcome this limitation by extending types with a notion of sequential composition, T;U [2,68]. The set of types together with the ; binary operation constitutes a monoid, for which a new type, Skip, acts as the neutral element and End acts as an absorbing element.

The regular recursive type <sup>µ</sup> <sup>α</sup>: <sup>s</sup>. &{Done : End, More : ?Int; <sup>α</sup>} describes an integer stream as seen from the point of view of the consumer. It offers a choice between Done—after which the channel must be closed (as witnessed by type End)—and More—after which an integer value must be received, followed by the rest of the stream. Types are categorised by kinds, so that we know that the recursion variable α is of kind session—denoted by s—and, thus, can be used with semicolon. Instead, we might want to write a type with a more context-free flavour. The type <sup>µ</sup> <sup>α</sup>: <sup>s</sup>. &{Leaf : Skip, Node : <sup>α</sup>; ?Int; <sup>α</sup>};End describes a protocol for the type-safe streaming of integer trees on channels. The continuation to the Leaf option is Skip, where no communication occurs but the channel is still open for further composition. The continuation to the Node choice receives a left subtree, an integer at the root and a right subtree. In either case, once the whole tree is received, the channel must be closed, as witnessed by the final End. Beyond first-order context-free session types (where only basic types are exchanged) [2,68] we may be interested in higher-order session types capable of exchanging values of complex types [19]. A goal of this paper is the integration of higher-order context-free session types into system F µ <sup>ω</sup> . We want to be able to abstract the type that is received on a tree channel, which is now possible by writing λα: <sup>t</sup>.µ <sup>β</sup> : <sup>s</sup>. &{Leaf : Skip, Node : <sup>β</sup>; ?α; <sup>β</sup>};End, where <sup>t</sup> is the kind of functional types.

A form of abstraction over session types with general recursion was proposed by Das et al. [24,25] via (nested) parametric polymorphism. In the notation of Das et al., we can write a type equation for abstracting the type being received on a stream channel Streamhαi .<sup>=</sup> &{Done : End, More : ?α; Streamhαi}. Using abstractions, we can write Stream as a function of its parameter <sup>α</sup>, Stream .<sup>=</sup> λα: <sup>t</sup>.&{Done : End, More : ?α; Stream <sup>α</sup>}; alternatively, we can use the <sup>µ</sup>-operator to rewrite the Stream type as λα: <sup>t</sup>.(<sup>µ</sup> <sup>β</sup> : <sup>s</sup>. &{Done : End, More : ?α.β}). Das et al. proved that parametrized type definitions over regular session types are strictly more expressive than context-free session types. To some extent, this analogy guides our approach: if adding abstraction (via parametric polymorphism) to regular types leads to nested types, what exactly does it mean to add abstraction (via a type-level λ-operator) to context-free types? Throughout this paper we analyse several increments to System F <sup>µ</sup> that culminate in adding λ-abstraction to context-free session types.

One of our focuses is necessarily the analysis of the type equivalence problem. The uncertainty about the decidability of this problem over recursive parametric types goes back to the 1970s [16,63]. Although the type equivalence problem for parametric (nested) session types and context-free session types is decidable, that for the combination of abstractions over context-free types may no longer be. In fact, this analysis constitutes an interesting journey towards a better understanding of the role of higher-order polymorphic recursion in presence of sequential composition, as well as the gains (and losses) resulting from combining abstraction with arbitrary (rather than tail) recursion.

Ultimately, decidability is not a sufficiently valuable measure regarding a type system's practicality. We look for type systems that may be incorporated into compilers. For that reason, we are interested in algorithms for type equivalence checking. Equivalence in F µ <sup>ω</sup> alone is already at least as hard as equivalence of deterministic pushdown automata. If we restrict recursion to the monomorphic case (requiring recursion variables to denote proper types, that is of kind s or t, collectively denoted by ∗) we lower the complexity of type equivalence to that of equivalence for finite-state automata. The extension with context-free session types is slightly more complex. In order to obtain "good" algorithms, we restrict the recursion to the monomorphic case, arriving at classes F µ<sup>∗</sup> <sup>ω</sup> , F <sup>µ</sup>∗; <sup>ω</sup> . Now the type equality problem for F µ∗; <sup>ω</sup> translates to the equivalence problem for simple grammars, which is still decidable [4,33]. Since F µ∗; <sup>ω</sup> subsumes F µ<sup>∗</sup> <sup>ω</sup> , our proof of the decidability of type equivalence serves as an alternative to that of Cai et al. [14] (restricted to contractive types).

Higher-order polymorphism allows for the definition of type operators and the internalisation of various (session-type) constructs that would otherwise be offered as built-in constructors. In this way, we are able to internalise basic session-type constructors such as sequential composition ; and the Dual type operator (which reverses the direction of communication between parties). Duality is often treated as an external macro. Gay et al. [34] explore different ways of handling the dual operator, all in a monomorphic setting. In the presence of polymorphism the dual operator cannot be fully eliminated without introducing co-variables. Internalisation offers a much cleaner solution.

Due to the presence of sequential composition, regular trees are not a powerful enough model for representing types (type TreeC a in Section 2 is an example). The main technical challenge when combining System F µ <sup>ω</sup> and context-free session types is making sure that the resulting model can still be represented by simple grammars, so that type equivalence may be decided by a practical algorithm. The difficulties arise with renaming bound variables. For infinite types, both renaming with fresh variables and using de Bruijn indices may create an infinite number of distinct variables, which makes the construction of a simple grammar simply impossible. For example, take the type λα: <sup>t</sup>.µ <sup>γ</sup> : <sup>t</sup>. λβ : <sup>t</sup>.α <sup>→</sup> <sup>γ</sup>, which stands for the infinite type λα: <sup>t</sup>.λβ : <sup>t</sup>.α <sup>→</sup> λβ : <sup>t</sup>.α <sup>→</sup> λβ : <sup>t</sup>... Renaming this type using a fresh variable at each step would result in a type of the form λυ<sup>1</sup> : <sup>t</sup>.λυ<sup>2</sup> : <sup>t</sup>.υ<sup>1</sup> <sup>→</sup> λυ<sup>3</sup> : <sup>t</sup>.υ<sup>1</sup> <sup>→</sup> λυ<sup>4</sup> : <sup>t</sup>..., requiring infinitely many variables. Similarly, de Bruijn indices [27] yield a type of the form λtλt1 → λt2 → λt3 → . . . that requires an infinite number of natural indices. We thus introduce minimal renaming that uses the least amount of variable names as possible (cf. Gauthier and Pottier [30]). This ensures that only finitely many terminal symbols are necessary, allowing for translating types into simple grammars.

Type languages live in term languages and we propose a term language to consume F µ; <sup>ω</sup> types. Based on Almeida et al. [2], we introduce a message-passing concurrent programming language. Type checking is decidable if type equivalence is, and it is, in particular, for F µ∗; <sup>ω</sup> .

The main contributions of this paper are as follows.


The type system presented in the paper combines three constructions: sequential composition of session types, higher-order kinds via type-level abstraction and application, and higher-order recursion. Prior to our work there is the system by Almeida et al. [4] which incorporates sequential composition and (firstorder) recursion, but no higher-order kinds. There is also the system by Cai et al. [14] which incorporates higher-order kinds and higher-order recursion, but no sequential composition. Our system is the first to incorporate all three constructions. Although some of the results are incremental and generalize results from the literature, the main technical challenge is understanding the border past which they do not hold anymore. For example, "just" including higher-order kinds into the system by Almeida et al. does not work, since we need to pay close attention to variable names, making sure that type equivalence is invariant with respect to alpha-conversion (renaming of bound variables). This called for a novel notion of renaming, inspired by Gauthier and Pottier [30]. Similarly, "just" including sequential composition into the system of Cai et al. does not work, since finite-state automata (or regular trees) are not enough to capture the expressive power of the new type system, even when restricted to first-order recursion. This required us to look at the more expressive framework of simple grammars, and introduce a translation from types to words of a simple grammar.

The rest of the paper is organised as follows. The next section motivates the type language and introduces the term language with an example. Section 3 introduces System F µ; <sup>ω</sup> , Section 4 discusses type equivalence and Section 5 shows that type equivalence is decidable for a fragment of the type language. Section 6 presents the term language and its metatheory. Section 7 discusses related work and Section 8 concludes the paper with pointers for future work. Proofs for the main results can be found in a technical report on arXiv [20].

# 2 Motivation

Our goal is to study type systems that combine equirecursion, higher-order polymorphism, and higher-order context-free session types, while incorporating these in programming languages.

$$\begin{array}{lcl} \emptyset & ::= & \{\} \mid \quad \emptyset & \sharp ::= ? \mid \mid \quad \circlearrowleft ::= \& \mid \quad \oplus \quad \* ::= \!\mid \quad s \\\\ \sqsubseteq & T \mid \quad \varnothing & ::= \!\mid \mid \quad \circlearrowleft ::= \!\mid \quad \otimes \quad \* ::= \!\mid \quad s \\\\ \sqsubseteq & T \mid \multimupleft \varnothing & ::\!\vdash T \mid \quad \circlearrowleft ::= \!\mid \quad \circlearrowleft ::= \!\mid \quad \circlearrowleft \qquad \vdash$$

<sup>T</sup> ::= <sup>T</sup> <sup>→</sup> <sup>T</sup> <sup>|</sup> <sup>L</sup>l<sup>i</sup> : <sup>T</sup><sup>i</sup><sup>M</sup> <sup>|</sup> <sup>∀</sup>α: κ. T <sup>|</sup> <sup>µ</sup> <sup>α</sup>: κ. T <sup>|</sup> <sup>α</sup> (<sup>F</sup> ) κ = t T ::= (F µ ) | ]T.T | {l<sup>i</sup> : Ti} | End (F µ· ) κ = ∗ T ::= (F µ ) | ]T | {l<sup>i</sup> : Ti} | End | T; T | Skip (F µ; ) κ = ∗ T ::= (F <sup>M</sup>) | λα: κ.T | T T (F M <sup>ω</sup> ), M ::= µ, µ·, µ; κ = ∗ | κ ⇒ κ

Fig. 1: Six F-systems.

Extensions of System F. Figure 1 motivates the construction by proposing six different type languages, culminating with F µ; <sup>ω</sup> . The initial system, F <sup>µ</sup>, includes well-known basic type operators [57]: functions T → U, records {l<sup>i</sup> : Ti} and variants hl<sup>i</sup> : Tii. Type Unit is short for {}, the empty record; we can imagine that Unit stands in place of an arbitrary scalar type such as Int and Bool. We also include variable names α, type quantification ∀α: κ. T and recursion µ α: κ. T. To control type formation, all variable bindings must be kinded with some kind κ, even if for the initial system, F <sup>µ</sup>, we only use the functional kind t.

We then build on F <sup>µ</sup> by considering (regular, tail recursive) session types; we represent the resulting system by F µ· . For example ?Int.!Bool.End is a type for a channel endpoint that receives an integer, sends a boolean, and terminates. At this point we introduce a kind s of session types to restrict the ways in which we can combine session and functional types together. For example, a well-formed type ?T.U is of kind s and requires U to be also of kind s (whereas <sup>T</sup> can be of kind <sup>∗</sup>, that is <sup>s</sup> or <sup>t</sup>). An example of an infinite session type is µ α: s. !Int.α that endlessly outputs integer values. For a more elaborate example consider the type IntStream <sup>=</sup> <sup>µ</sup> <sup>α</sup>: <sup>s</sup>. &{Done : End, More : ?Int.α} that specifies a channel endpoint for receiving a (finite or infinite) stream of integer values. Communication ends after choice Done is selected.

The next step of our construction takes us to context-free session types; the resulting system is denoted by F µ; . We introduce a new construct for sequential composition T;U, and a new type Skip, acting as the neutral element of sequential composition [68]. The message constructors are now unary (?T and !T) rather than binary. In System F <sup>µ</sup>; we distinguish between the traditional End type and the Skip type. These types have different behaviours: End terminates a channel, while Skip allows for further communication. Type equality is more subtle for context-free session types, because of the monoidal semantics of sequential composition. It is derivable from the following axioms:


Fig. 2: Relation between the main classes of types in this paper (arrows denote strict inclusions).

Although the syntax of F µ· is not formally included in the syntax of F µ; , we can embed recursive session types into context-free session types by mapping ]T.U into ]T;U. It is well-known that context-free session types allow for higher computational expressivity: while F <sup>µ</sup> and F µ· can be represented via finite-state automata, F µ; can only be represented with simple grammars [4,33].

To finalise our construction, we include type abstraction λα: κ.T and type application T U. Again, type abstraction binds a variable which must be kinded. Kinds can now be of higher-order κ ⇒ κ 0 . For each of the three systems F <sup>µ</sup>, F µ· , F <sup>µ</sup>; we arrive at a higher-order version, respectively F µ <sup>ω</sup> , F µ· <sup>ω</sup> , F µ; <sup>ω</sup> (all of which we represent as F <sup>M</sup> <sup>ω</sup> ). In System F µ· <sup>ω</sup> , for example, we can specify channels for receiving (finite or infinite) sequences of values of arbitrary (but fixed) types,

$$\mathsf{Stream} = \lambda \alpha \colon \mathsf{T}. (\mu \beta \colon \mathsf{s}. \& \{\mathsf{Done} \colon \mathsf{End}, \mathsf{More} \colon ? \alpha. \beta\})$$

where α can be instantiated with the desired type; in particular, Stream Int would be equivalent to the aforementioned IntStream.

It turns out that the expressive power of general higher-order systems F <sup>M</sup> ω is too large for practical purposes. Even the simplest case F µ <sup>ω</sup> is at least as expressive as deterministic pushdown automata (or equivalently, first-order grammars), for which known equivalence algorithms are notoriously impractical. By impractical we mean that, although there exists a proof of decidability (due to S´enizergues [61], later improved by Stirling and Jancar [46,65]), the underlying algorithm is rather complex. To the best of our knowledge, there is no practical implementation of an algorithm to decide the equivalence of deterministic pushdown automata. This is essentially due to polymorphic recursion, which can be encoded by a higher-order µ-operator (we provide an example at the end of Section 5). Therefore, it makes sense to restrict the kind κ of the recursion operator µ α: κ. T. We use the notation µ<sup>∗</sup> to mean the subclass of types written using only <sup>∗</sup>-kinded recursion, i.e., <sup>µ</sup> <sup>α</sup>: <sup>t</sup>. T or <sup>µ</sup> <sup>α</sup>: <sup>s</sup>. T.

Figure 2 summarizes the main relations between the classes of types in our paper. Firstly, we obtain a lattice where the expressive power increases as we travel down (from functional to session to context-free session types) and right (from simple polymorphism to higher-order polymorphism with monomorphic recursion to arbitrary recursion). Four of the classes can be represented using finite-state automata (up to F µ∗· <sup>ω</sup> ). By including sequential composition (F µ; and F µ∗; <sup>ω</sup> ) we are still able to represent types using simple grammars. Once we allow for arbitrary recursion, the expressiveness of our model requires the computational power of deterministic pushdown automata.

Programming with F µ; <sup>ω</sup> . We now turn our attention to the term language, a message passing, concurrent functional language, equipped with context-free session types. Start with a stream of values of type a. Such a stream, when seen from the side of the reader, offers two choices: Done and More. In the former case the interaction is over; in the latter the reader reads a value of type a, as in ?a, and recurses. This is the stream type we have seen before only that, rather than closing the channel endpoint (with type End), it terminates with type Skip, so that it may be sequentially composed with other types. In this informal introduction to the term language we omit the kinds of type variables.

```
type Stream a = &{ Done : Skip , More : ? a ; Stream a }
```
A fold channel, as seen from the side of the folder, is a type of the following form. We assume that application binds tighter than semicolon, that is, type Stream a ; !b ; End is interpreted as (Stream a) ; !b ; End.

```
type Fold a b = ?( b → a → b ) ; ? b ; Stream a ; ! b ; End
```
Consumers of this type first receive the folding function, then the starting element, then the elements to fold in the form of a stream, and finally output the result of the fold. The type terminates with End for we do not expect type Fold to be further composed. Compare Fold with the type for a conventional functional left fold: (b → a → b) → b → List a → b.

We now develop a function that consumes a Fold channel. Syntax x . f is for the inverse function application with low priority, that is x . f . g = g (f x). Recall that Unit is an alternative notation for the empty record type, {}.

```
foldServer : ∀a .∀b . Fold a b → Unit
foldServer c = let (f , c ) = receive c in
               let (e , c ) = receive c in foldS f e c
foldS : ∀a .∀b . ( b → a → b ) → b → Stream a ;! b ; End → Unit
foldS f e c = match c with
  { Done c → c . send e . close
  , More c → let (x , c ) = receive c in foldS f ( f e x ) c
  }
```
Function foldServer consumes the initial part of the channel and passes the rest of the channel to the recursive function foldS that consumes the whole stream while accumulating the fold value. In the end, when branch Done is selected, the fold value is written on the channel and the channel closed. In general, the channel operators—receive, send, select—return the same channel in the form of a new identifier. It is customary to reuse the identifier name—c in the example, as in let (f, c)= receive c—since it denotes the same channel. Syntax c . ... hides the continuation channel. The case for the external choice—match—also returns the continuation (in each branch) so that interaction on the channel endpoint may proceed.

We may now write different clients for the foldServer. Examples include a client that generates a stream from a pair of integer values (denoting an interval); another that generates the stream from a list of values; and yet another that generates the stream from a binary tree. We propose a further client. Consider the type of a channel that exchanges trees in a serialized format [68]. Its polymorphic version, as seen from the point of view of the reader, is as follows:

```
type TreeChannel a = TreeC a ; End
type TreeC a = &{ Leaf : Skip , Node : TreeC a ;? a ; TreeC a }
```
We transform trees as we read from tree channels into streams. Function flatten receives a tree channel and a stream channel (as seen from the point of view of the writer, hence the Dual) and returns the unused part of the latter.

```
flatten : ∀a .∀c . TreeChannel a → ( Dual Stream a ) ; c → c
```
We are now in a position to write a client that checks whether all values in a tree channel are positive.

```
allPositive : TreeChannel Int → Dual ( Fold Int Bool ) → Bool
allPositive t c =
  let c = send (λx : Bool .λy :Int. x && y > 0) c in
  let c = send True c in
  let c = flatten [Int ] [? Bool ;End ] t c in
  let (x , c ) = receive c in
  close c ; x
```
The client sends a function and the starting value on the fold channel. Then, it flattens the given tree t, receives the folded value and closes the channel. Syntax flatten [Int] [?Bool;End] is for term-level type application. We mean to flatten a tree of Int values on a stream channel whose continuation is of type ?Bool;End. The continuation channel is bound to c so that we may further receive the fold value and thereupon close the channel. Syntax e1;e2 is for sequential composition and abbreviates let {} = e1 in e2 given that {}, the Unit value, is linear and hence must be consumed.

Finally, a simple application creates a new TreeC channel, passing one end to a thread that produces a tree channel. Function new creates a channel and returns its two ends. It then creates a Fold channel, distributes one end to a thread foldServer and the other to function allPositive. The fork primitive receives a suspended computation (a thunk, of the form λx:Unit.e) and creates a new thread that runs in parallel with that from where the fork was issued.

```
system : Bool
system = let ( tr , tw ) = new [ TreeC Int] () in
  fork (λ_ : Unit . produce tw ) ;
  let ( fr , fw ) = new [ Fold Int Bool ] () in
  fork (λ_ : Unit . foldServer fr ) ;
  allPositive tr fw
```


Fig. 3: The syntax of types.

Fig. 4: Type constants and kinds.

Type renaming renameS(T)

```
renameS(ι) = ι renameS(α) = α renameS(T U) = renameS∪fv(U)(T) renameS(U)
renameS(λα: κ.T) = λυ : κ.renameS(T[υ/α]) where υ = firstS(λα: κ.T)
```
Fig. 5: Type renaming.

# 3 Kinds and Types

This section introduces in detail System F µ; <sup>ω</sup> , an extension of System F µ <sup>ω</sup> incorporating higher-order context-free session types. The syntax of types is presented in Fig. 3. A type is either a constant ι (as in Fig. 4), a type variable α, an abstraction λα: κ.T or an application T U. Besides incorporating the standard session type constructors as constants, system F µ; <sup>ω</sup> also includes Dual as a constant for a type operator mapping a session type to its dual. Note also that ∀α: κ. T is syntactic sugar for ∀κ(λα: κ.T). Analogously, µ α: κ. T abbreviates µκ(λα: κ.T). This simplifies our analysis as lambda abstraction becomes the only binding operator.

A distinction between session and functional types is made resorting to kinds <sup>s</sup> and <sup>t</sup>, respectively. These are the kinds of proper types, <sup>∗</sup>; we use the symbol κ to represent either the kind of a proper type or that of a type operator, of the form κ ⇒ κ 0 . A kinding context ∆ stores kinds for type variables using bindings of the form α: κ. Notation ∆ + α: κ denotes the update of kinding context ∆, defined as (∆, α: κ) + α: κ <sup>0</sup> = ∆, α: κ <sup>0</sup> and <sup>∆</sup> <sup>+</sup> <sup>α</sup>: <sup>κ</sup> <sup>=</sup> ∆, <sup>α</sup>: <sup>κ</sup> when <sup>α</sup> 6∈ <sup>∆</sup>.

To define type formation, we require a few notions. Firstly comes the notion of renaming, adapted from Gauthier and Pottier [30] and presented in Fig. 5. Renaming essentially replaces a type T by a minimal alpha-conversion of T. By alpha-conversion we mean that renameS(T) renames bound variables in T. By "minimal" we mean that each bound variable is renamed to its lowest possible value. We assume at our disposal a countable well-ordered set of type variables {υ1, . . . , υn, . . .}. In renameS(T), parameter S is a set containing type variables unavailable for renaming; in the outset of the renaming process S is the empty set, since all variables are available. In that case the subscript S is often omitted. The case for lambda abstraction renames the bound variable by the smallest variable not in the set S ∪ fv(λα: κ.T), which we denote by firstS(λα: κ.T).

Renaming is what allows us to check whether type abstractions λα: κ.T, λβ : κ.U are equivalent. For the types to be equivalent, both bound variables α and β ought to be renamed to the same variable υ<sup>j</sup> . In summary, renaming provides a syntax-guided approach to the equivalence of lambda-abstractions, where the names of bound variables should not matter. Our notion of type equivalence preserves alpha-conversions up to renaming: if T and U only differ on bound variables, then rename(T) = rename(U) and in particular rename(T) ∼ rename(U). We will come back to this point after we define type equivalence in Section 4.

We can easily see that renaming uses the minimum amount of variable names possible; for example, rename(λα: t.λβ : s.β) = λυ<sup>1</sup> : t.λυ<sup>1</sup> : s.υ1. Notice how both bound variables α and β are renamed to υ1, the first variable available for replacement. Also, renaming blatantly violates the Barendregt's variable convention [9] used in so many works; for example rename(υ<sup>1</sup> (λα: T.α)) = υ<sup>1</sup> (λυ<sup>1</sup> : T.υ1), where variable υ<sup>1</sup> is both free and bound in the resulting type. Even if renaming violates the variable convention, substitution can still be performed without resorting to the "on-the-fly" renaming of Curry and Feys [21,40]. When υ<sup>1</sup> 6= υ2, we have that

> (λυ<sup>1</sup> : κ.λυ<sup>2</sup> : κ 0 .U) T reduces to rename((λυ<sup>2</sup> : κ 0 .U)[T/υ1]).

Then, we have (λυ<sup>2</sup> : κ 0 .U)[T/υ1] = λυ<sup>2</sup> : κ 0 .(U[T/υ1]) since the renaming rule for application guarantees that υ<sup>2</sup> ∈/ fv(T). Otherwise if υ<sup>1</sup> = υ2, we have (λυ<sup>1</sup> : κ 0 .U)[T/υ1] = λυ<sup>1</sup> : κ 0 .U. This justifies the inclusion of set S in the renaming process. From now on, we assume that all types have gone through the renaming process.

Next comes the notion of type reduction (Fig. 6). Apart from beta reduction (rule R-β), the definition provides for sequential composition, for unfolding recursive types and for reducing Dual T types. Note that renaming is further invoked in rule R-β for beta reduction does not preserve renaming: consider the renamed type (λυ<sup>1</sup> : <sup>t</sup>.λυ<sup>2</sup> : <sup>t</sup>.υ<sup>1</sup> <sup>→</sup> <sup>υ</sup>2)Unit. The type resulting from the substitution (λυ<sup>2</sup> : <sup>t</sup>.υ<sup>1</sup> <sup>→</sup> <sup>υ</sup>2)[Unit/υ1] is λυ<sup>2</sup> : <sup>t</sup>.Unit <sup>→</sup> <sup>υ</sup><sup>2</sup> which is not renamed and, therefore, not equivalent to λυ<sup>1</sup> : <sup>t</sup>.Unit <sup>→</sup> <sup>υ</sup><sup>1</sup> according to our rules in Section 4. Thanks to our modified rule R-β, we preserve renaming under reductions: if T = rename(T) and T −→ U then U = rename(U).

We also need the notion of weak head normal form borrowed from the lambda calculus [9,10]. We say that a type T is in weak head normal form, T whnf, if it is irreducible, i.e., T 6−→. Although this is a negative definition, in the technical report we provide an equivalent, rule-based characterisation of weak head normal Type reduction T −→ T

$$\begin{array}{llll} \text{R-SEq1} & \text{R-SEq2} & \text{R-SEq3} \\ \text{Sip}; T \longrightarrow T & T; U \longrightarrow V; U & T(T; U) \\ \\ \text{R-\beta} & (\lambda \alpha : \kappa . T) \ U & \text{R-D\'amp} \\ (\lambda \alpha : \kappa . T) \ U & \longrightarrow \text{remame}(T[U/\alpha]) & \frac{T \longrightarrow U}{TV \longrightarrow UV} & \text{Dual}(T; U) \longrightarrow \text{Dual}(T; \text{Dual}) \\ \\ \text{R-\text{D-Skip}} & \text{R-D\'END} & \text{R-D\'i} & \text{R-D\'i} \\ \text{Dual} & \text{Sip} & \text{Dual} \boxed{\text{End} \longrightarrow \text{End}} & \text{Dual}(\uparrow T) \longrightarrow \text{T} & \text{Dual}(\uparrow T) \longrightarrow \text{??} \\ \\ \text{R-\text{D}\'} & & \text{R-D\'o} & \text{R-D\'o} \\ \text{Dual}(\& \{\!\!\!u.\,:\, T\_{i}\!\}) & \longrightarrow \circledast \{\!\!\!u.\,:\, \text{Dual}(\uparrow T\_{i})\} & \longrightarrow \& \{\!\!\!u.\,:\, \text{Dual}(\uparrow T\_{i})\} \\ \\ & & & \text{R-\text{D}\'o} & \text{R-D\'o} \\ \text{R} & & & \text{R-\'u} & \text{R-D\'o} \\ \hline \text{Dual} & \text{D} \text{an}(\text{D}) & & & \text{Da} \end{array}$$

Type formation ∆ ` T : κ

K-Const ∆ ` ι : κ<sup>ι</sup> K-Var α: κ ∈ ∆ ∆ ` α : κ K-TAbs ∆ + α: κ ` T : κ 0 ∆ ` λα: κ.T : κ ⇒ κ 0 K-TApp ∆ ` T : κ ⇒ κ <sup>0</sup> ∆ ` U : κ T U norm ∆ ` T U : κ 0

form types, which can be used in a compiler as well as in our proofs. We say that type T normalises to type U, written T ⇓ U, if U whnf and U is reached from T in a finite number of reduction steps (note that any term which is already whnf normalises to itself). We write T norm to denote that T ⇓ U for some U.

For example, suppose we want to normalise the type µ<sup>s</sup> T, where T is the type λυ<sup>1</sup> : <sup>s</sup>.⊕{Done : End, More : !α}; Dual <sup>υ</sup>1. By computing all reductions from <sup>µ</sup>sT, we obtain µsT −→ T (µsT) −→ ⊕{Done : End, More : !α}; Dual(µsT) 6−→ for which we conclude that µ<sup>s</sup> T ⇓ ⊕{Done : End, More : !α}; Dual(µsT). Similarly, we can reason that µ<sup>t</sup> (λυ<sup>1</sup> : t.υ1), µ<sup>s</sup> (λυ<sup>1</sup> : s.Skip; υ1) and µ<sup>s</sup> (λυ<sup>1</sup> : s.Dual υ1) are all examples of non-normalising expressions.

Equipped with normalisation, we can introduce type formation, which we do via the rules in Fig. 7. Rule K-Const introduces constants as types whose kinds match those of Fig. 4. Rule K-Var reads the kind of a type variable from context ∆. An abstraction λα: κ.T is a well-formed type with kind κ ⇒ κ 0 if T is well formed in context ∆ updated with entry α: κ (rule K-TAbs). The update is necessary since we are dealing with renamed types and the same type variable may appear with different kinds in nested abstractions.

It is not until we reach rule K-TApp that we find a proviso about the normalisation of a type. This is standard and analogous to a condition on contractivity. The goal is to eliminate types that reduce indefinitely without reaching a whnf.

Theorem 1. Let ∆ ` T : κ.

Preservation. If T −→ U, then ∆ ` U : κ. Confluence. If <sup>T</sup> −→ <sup>U</sup> and <sup>T</sup> −→ <sup>V</sup> , then <sup>U</sup> −→<sup>∗</sup> <sup>W</sup> and <sup>V</sup> −→<sup>∗</sup> <sup>W</sup>. Weak normalisation. T ⇓ U for some U. Furthermore, if T ⇓ V , then U = V .

We finally arrive at the main decidability result in this section. In its proof, we make use of the fact that recursion is restricted to kind ∗ to limit the possible subexpressions of the form µ<sup>∗</sup> U that might appear in the normalisation of T.

Theorem 2 (Decidability of type formation). ∆ ` T : κ is decidable for types in F µ∗; <sup>ω</sup> .

# 4 Type equivalence

This section introduces type bisimulation as our notion of type equivalence. We define a labelled transition system (LTS) on the space of all types and write T <sup>a</sup> −→ <sup>U</sup> to denote that <sup>T</sup> has a transition by label <sup>a</sup> to <sup>U</sup>. The grammar for labels and the LTS rules are in Fig. 8.

If T is not in weak head normal form, then we must normalise it to some type U, so that T has the same transitions as U (rule L-Red). Otherwise if T whnf, then the transitions of T can be immediately derived by looking at the corresponding rule for T as follows. If T is a variable, use rule L-Var1 (with m = 0). If T is a constant (other than Skip), use rule L-Const. Note that if T is a lone Skip, then it has no transitions. If T is an abstraction, use rule L-Abs.

If T is an application, then we need to look inside the head. We write T as T<sup>0</sup> T<sup>1</sup> . . . T<sup>m</sup> with m ≥ 1 where T<sup>0</sup> is not an application, and look at T0. If T<sup>0</sup> is a variable, use rules L-Var1 and L-Var2. If <sup>T</sup><sup>0</sup> is one of the constants <sup>→</sup>, <sup>∀</sup>κ, {li} or <sup>L</sup>l<sup>i</sup>M, use rule L-ConstApp. Note that <sup>T</sup><sup>0</sup> is neither an abstraction nor µκ, since T is in weak head normal form. If T<sup>0</sup> is ], we use rules L-Msg1 and L-Msg2. If T<sup>0</sup> is Dual, then the only way for T to be well-formed and in weak head normal form is if m = 1 and T<sup>1</sup> is α or α U<sup>1</sup> . . . Um, in which case we use rules L-DualVar1 and L-DualVar2.

If T<sup>0</sup> is ; , we require an additional case analysis on T1. If m = 1, use rule L-Seq1. Otherwise m = 2 due to kinding. If T<sup>1</sup> is a variable, use rule L-VarSeq1 (with m = 0). If T<sup>1</sup> is a constant, then it must be of kind s. T<sup>1</sup> cannot be Skip, because T is in weak normal form, so it must be End, in which case we use rule L-EndSeq (End is an absorbing element, so End;U simply makes a transition to Skip without executing U). If T<sup>1</sup> is End. Note that T<sup>1</sup> cannot be an abstraction due to kinding.


Fig. 8: Labelled transition system for types.

If T<sup>1</sup> is an application, then again we write T<sup>1</sup> as U<sup>0</sup> U<sup>1</sup> . . . U<sup>n</sup> with n ≥ 1 where the head U<sup>0</sup> is not an application, and look at U0. If U<sup>0</sup> is a variable, use rules L-VarSeq1 and L-VarSeq2. If U<sup>0</sup> is a constant, it must be one of ; , µκ, ], {li} or Dual due to kinding. If <sup>U</sup><sup>0</sup> is ], use rules L-MsgSeq1 and L-MsgSeq2. If <sup>U</sup><sup>0</sup> is {li}, use rule L-ChoiceSeq. If <sup>U</sup><sup>0</sup> is Dual, the only way for <sup>T</sup> to be well-formed and in weak head normal form is if n = 1 and U<sup>1</sup> is α or α V<sup>1</sup> . . . V`, in which case we use rules L-DualSeq1 and L-DualSeq2. Note that U<sup>0</sup> cannot be ; , µ<sup>κ</sup> or an abstraction, since T is in weak normal form.

Let us clarify our LTS rules with an example. Consider the following type λυ<sup>1</sup> : <sup>t</sup>.µ <sup>υ</sup><sup>2</sup> : <sup>s</sup>.⊕{Done : End, More : !υ1}; Dual <sup>υ</sup><sup>2</sup> and call it <sup>T</sup>. <sup>T</sup> is a type abstraction (on type variable <sup>υ</sup>1), of kind <sup>t</sup> <sup>⇒</sup> <sup>s</sup>. It specifies a channel alternating between: offer a choice and output a value of type υ1; or select a choice and input a value of type υ1. The polarity is swapped thanks to the application of constant Dual to the recursion variable υ2. To construct the (fragment of the) LTS generated by this type, let us first desugar T into λυ<sup>1</sup> : t.U where U is the

Fig. 9: The LTS for type λυ<sup>1</sup> : <sup>t</sup>.U. Normalisation <sup>T</sup><sup>1</sup> ⇓ <sup>T</sup><sup>2</sup> is represented as <sup>T</sup><sup>1</sup> <sup>⇒</sup> <sup>T</sup><sup>2</sup> and <sup>U</sup> is a shorthand for type <sup>µ</sup><sup>s</sup> (λυ<sup>2</sup> : <sup>s</sup>.⊕{Done : End, More : !υ1}; Dual <sup>υ</sup>2).

type <sup>µ</sup><sup>s</sup> (λυ<sup>2</sup> : <sup>s</sup>.⊕{Done : End, More : !υ1}; Dual <sup>υ</sup>2). Notice that <sup>U</sup> normalises to ⊕{Done : End, More : !υ1}; DualU. The LTS for the example is sketched in Fig. 9. In this case, only finitely many types appear. However, more elaborate examples involving sequential composition or higher-order recursion may lead to an infinite graph of transitions.

Given the LTS rules, we can define, in the standard way, a notion of bisimulation. A binary relation R on types is called a bisimulation if, for every (T,U) ∈ R and every transition label a:


We say that types T and U are bisimilar, written T ∼ U, if there exists a bisimulation R such that (T,U) ∈ R.

Intuitively, a notion of type equivalence must preserve and reflect the syntax of type constructors: for example, a type T → U is equivalent to a type T <sup>0</sup> <sup>→</sup> <sup>U</sup> 0 iff T, T <sup>0</sup> are equivalent and U, U <sup>0</sup> are equivalent. Using the bisimulation technique, we achieve this by considering a labelled transition system on types: T → U has a transition labelled →<sup>1</sup> to T and a transition labelled →<sup>2</sup> to U. In this way, T → U can only be equivalent to another type which has two transitions with those same labels. For each of the type constructors (→, ∀κ, !, ?, {`i}, and so on) we have suitable transition rules. Moreover, a type sometimes needs to be reduced before a type constructor is found at the root of the syntax tree. If T normalizes to U, then we expect T and U to be bisimilar, which is achieved thanks to rule L-Red. This handles the various reductions: betareductions arising from lambda-abstraction and applications (e.g., (λα: κ.T)U reduces to rename(T[U/α])), reductions arising from the monoidal structure of sequential composition (e.g., Skip; T reduces to T), reductions arising from the internalisation of duality as a type constructor (e.g., Dual(!T) reduces to ?T) and reductions arising from the recursion (e.g., µ<sup>κ</sup> T reduces to T (µ<sup>κ</sup> T)).

Our notion of type equivalence enjoys natural properties and behaves as expected with respect to the notions of reduction, normalisation and kinding from Section 3. We can derive rules for type equivalence, that could be used to define another coinductive notion of equivalence, via effective syntax-directed rules. We can show that type equivalence is preserved under renaming, reduction and normalisation. We can also show that the axioms for sequential composition in the introduction (1) are derivable from our notion of bisimulation. These additional results are presented in the technical report [20].

# 5 Decidability of type equivalence

This section presents results on decidability of type equivalence. Our approach consists in translating types to objects in some computational model. We look at finite-state automata (for types in F <sup>µ</sup>, F µ<sup>∗</sup> <sup>ω</sup> , F µ· , and F µ∗· <sup>ω</sup> ), simple grammars (for types in F <sup>µ</sup>; and F µ∗; <sup>ω</sup> ) and deterministic pushdown automata (for types in F µ <sup>ω</sup> , F µ· <sup>ω</sup> and F µ; <sup>ω</sup> ).

We say that a grammar in Greibach normal form is a tuple (T , N , γ, R) where: T is a set of terminal symbols, denoted by a, b, c; N is a set of nonterminal symbols, denoted by X, <sup>Y</sup> ,Z; <sup>γ</sup> ∈ N <sup>∗</sup> is the starting word; and R ⊆ N ×T ×N <sup>∗</sup> is a set of productions. A grammar is said to be simple if, for every nonterminal X and every terminal a, there is at most one production (X, a, δ) ∈ R [51].

Greek letters γ and δ denote (possibly empty) words of nonterminal symbols. Productions are written as X <sup>a</sup> −→ <sup>δ</sup>. We define a notion of bisimulation for grammars via a labelled transition system. The system comprises a set of states N ∗ corresponding to words of nonterminal symbols. For each production X a−→ <sup>γ</sup> and each word of nonterminal symbols <sup>δ</sup>, we have a labelled transition Xδ <sup>a</sup>−→ γδ. We let ≈ denote the bisimulation relation for grammars (the definition is similar to that in Section 4).

For the moment we focus on the class F µ∗; <sup>ω</sup> and we explain how to convert a type T into a simple grammar (T<sup>T</sup> , N<sup>T</sup> , word(T), R<sup>T</sup> ). The conversion is based on a function word(T) that maps each type T into a word of nonterminal symbols, while introducing fresh nonterminals and productions. In our construction, following the approach by Costa et al. [19], we use a nonterminal symbol with no productions, denoted by ⊥, in order to separate the two descendants of a send/receive operation such as !T;U. The sequence of nonterminal symbols word(T) is defined as follows. First consider the cases in which T whnf.


Finally, let us handle the cases where T is not in weak head normal form.


In the above construction, we create fresh symbols each time we encounter a weak head normal form other than Skip. In other words, N<sup>T</sup> is the set containing ⊥ and all nonterminals Y created during the computation of word(T). Another key insight is that the sequential composition of types is translated into a concatenation of words: word(T1;T2; . . . ;Tn) = word(T1) word(T2). . . word(Tn). This allows our construction to terminate: even if the transitions lead to infinitely many types, they are split on the sequential composition operator, and so we only need to consider finitely many subexpressions.

For the last case in our construction to be well-defined, i.e., when T ⇓ U 6= Skip, we require word(U) to be non-empty. Indeed, if Uwhnf, then we can observe (by inspecting all cases) that word(U) = ε iff U = Skip. We also need to argue that the construction of word(T) eventually terminates. For this, we keep track of all types visited during the construction, and we only add a fresh nonterminal Y to our grammar if the type visited is syntactically different from all types visited so far. Therefore, we reuse the same symbol Y with the same productions each time we revisit a type. With all these observations, we get the following result.

Lemma 1. Suppose that T ∈ F µ∗; <sup>ω</sup> . Then the construction of word(T) terminates producing a simple grammar.

We illustrate the above construction with the polymorphic tree exchanging example from Section 2,

```
type TreeC a = &{ Leaf : Skip , Node : TreeC a ; ? a ; TreeC a }
```
that is written in F µ∗; <sup>ω</sup> as <sup>T</sup><sup>0</sup> <sup>=</sup> λυ<sup>1</sup> : <sup>t</sup>.µ <sup>υ</sup><sup>2</sup> : <sup>s</sup>. &{Leaf : Skip, Node : <sup>υ</sup>2; ?υ1; <sup>υ</sup>2}. For ease of notation, in this example we write &<sup>i</sup> as shorthand for &{Leaf, Node}<sup>i</sup> . Since T<sup>0</sup> is in weak head normal form, word(T0) returns a fresh symbol, which we call X0. We also have a production X<sup>0</sup> λυ<sup>1</sup> : <sup>t</sup> −→ word(T1), where <sup>T</sup><sup>1</sup> is the type <sup>µ</sup> <sup>υ</sup><sup>2</sup> : <sup>s</sup>. &{Leaf : Skip, Node : <sup>υ</sup>2; ?υ1; <sup>υ</sup>2}. Since <sup>T</sup><sup>1</sup> is not in whnf, we must normalise it, to get T<sup>2</sup> = &{Leaf : Skip, Node : T1; ?υ1; T1}. Therefore word(T1) returns a fresh symbol, which we call X1. To obtain the transitions of X1, we must first compute word(T2), which is a fresh symbol X<sup>2</sup> with transitions X<sup>2</sup> &1 −→ word(Skip) and X<sup>2</sup> &<sup>2</sup> −→ word(T1; ?υ1; <sup>T</sup>1). Thus we also get <sup>X</sup><sup>1</sup> &<sup>1</sup> −→ word(Skip) and X<sup>1</sup> &<sup>2</sup> −→ word(T1; ?υ1; <sup>T</sup>1).

We have word(Skip) = ε, but we still need to compute word(T1; ?υ1; T1). This type normalises to T<sup>3</sup> = T2; ?υ1; T<sup>1</sup> since T<sup>1</sup> ⇓ T2. Thus word(T1; ?υ1; T1) is a fresh symbol X3. To obtain the productions of X<sup>3</sup> we must compute word(T2; ?υ1; T1) = word(T2) word(?υ1) word(T1). At this point we already have word(T1) = X<sup>1</sup> and word(T2) = X2. We still need to compute word(?υ1), which is a fresh symbol X<sup>4</sup> with productions X<sup>4</sup> ?<sup>1</sup> −→ word(υ1)<sup>⊥</sup> and <sup>X</sup><sup>4</sup> ?<sup>2</sup> −→ <sup>ε</sup>. In turn, word(υ1) is a fresh symbol X<sup>5</sup> with a production X<sup>5</sup> <sup>υ</sup><sup>1</sup> −→ <sup>ε</sup>. Finally, we get word(T2; ?υ1; T1) = X2X4X1, which means we can write the productions for X3: X<sup>3</sup> &<sup>1</sup> −→ <sup>X</sup>4X<sup>1</sup> and <sup>X</sup><sup>3</sup> &<sup>2</sup> −→ <sup>X</sup>3X4X1.

Putting all this together, we can finally obtain the simple grammar:

$$\begin{array}{ccccccccc} X\_0 \stackrel{\lambda v\_1 \colon \mathbb{T}}{\longrightarrow} X\_1 & & X\_1 \stackrel{\&\_1}{\longrightarrow} \varepsilon & & X\_1 \stackrel{\&\_2}{\longrightarrow} X\_3 & X\_2 \stackrel{\&\_2}{\longrightarrow} \varepsilon & X\_2 \stackrel{\&\_2}{\longrightarrow} X\_3\\ X\_3 \stackrel{\&\_1}{\longrightarrow} X\_4 X\_1 & & X\_3 \stackrel{\&\_2}{\longrightarrow} X\_3 X\_4 X\_1 & & X\_4 \stackrel{\&\_1}{\longrightarrow} X\_5 \bot & X\_4 \stackrel{\&\_2}{\longrightarrow} \varepsilon & & X\_5 \stackrel{\&\_1}{\longrightarrow} \varepsilon \end{array}$$

Next, we argue that type equivalence (i.e., bisimilarity on types) corresponds to bisimilarity on the corresponding grammars. This is achieved by the following lemma, that asserts that the LTS of a type and the LTS of the corresponding word of nonterminals have exactly the same transitions.

Lemma 2 (Full abstraction). Let T ∈ F µ∗; <sup>ω</sup> and (T<sup>T</sup> , N<sup>T</sup> , word(T), R<sup>T</sup> ) the corresponding simple grammar. Suppose also that word(T) ≈ γ.

1. If T <sup>a</sup>−→ <sup>U</sup> then there exists <sup>γ</sup> 0 such that γ <sup>a</sup> −→ <sup>γ</sup> <sup>0</sup> and word(U) <sup>≈</sup> <sup>γ</sup> 0 . 2. If γ <sup>a</sup> −→ <sup>γ</sup> 0 then there exists U such that T <sup>a</sup> −→ <sup>U</sup> and word(U) <sup>≈</sup> <sup>γ</sup> 0 .

As a consequence of the above result, we get soundness and completeness of the bisimilarity word(T) ≈ word(U) with respect to the bisimilarity T ∼ U. Indeed by Lemma 2, any sequence of transitions starting from T can be matched by a sequence of transitions starting from word(T); and similarly for U. Thus T ∼ U iff word(T) ≈ word(U).

#### Theorem 3. The type equivalence problem is decidable for types in F µ∗; <sup>ω</sup> .

For the remainder of this section, we look at the other classes of types in Fig. 2 and examine the computation models they correspond to. Since class F µ; is contained in F µ∗; <sup>ω</sup> , we can express types without λ-abstractions with simple grammars as well. In this way we recover previous results in the literature [4,19].

Let us now look at the class F µ∗· <sup>ω</sup> . In this class we do not have Skip nor sequential composition and message operators are binary (]T.U) rather than unary. Since we do not have sequential composition, there is no need to consider words of nonterminals, and instead it suffices to translate types into single symbols, i.e., states in an automaton. Moreover, since there is no recursion beyond µκ, only finitely many types can be reached from a given T. We can thus adapt our construction as follows for F µ∗· <sup>ω</sup> . In the definition of the LTS (Fig. 8):


Also replace the construction of word(T) into a construction of state(T), associating to each type T a state in a finite-state automata. For each transition T <sup>a</sup> −→ <sup>U</sup> we have the corresponding transition state(T) <sup>a</sup> −→ state(U). Notice that the resulting automata is deterministic since the original LTS is also deterministic (for each type T and label a, there is at most one transition T <sup>a</sup>−→ <sup>U</sup>). Since bisimilarity of deterministic finite-state automata can be decided in polynomial time [44], we get the following results.

#### Theorem 4.


Clearly, Theorem 4 applies to the subclasses of F µ∗· <sup>ω</sup> : F <sup>µ</sup>, F <sup>µ</sup>· and F µ<sup>∗</sup> <sup>ω</sup> . In this way we recover previous results in the literature [14,19,33].

Finally, we consider the classes F µ <sup>ω</sup> , F µ· <sup>ω</sup> and F µ; <sup>ω</sup> involving arbitrarily-kinded recursion. We shall show that these classes are already powerful enough to simulate deterministic pushdown automata; hence, the type equivalence problem becomes impractical (i.e., no practical implementation of an algorithm is known). We only focus on the simplest case F µ <sup>ω</sup> , as the others two classes are even more expressive. Instead of looking at deterministic pushdown automata, we look at deterministic first-order grammars, which constitute an equivalent model of computation [46]. This choice simplifies our construction. We say that a first-order grammar is a tuple (X , T , N , E, R) where:


A first-order grammar is deterministic if, for every X and a, there is at most one production (X, a, E) ∈ R.

Just as a simple grammar defines an LTS over words of nonterminals, a firstorder grammar defines an LTS over the set E<sup>0</sup> of closed expressions. For each production X α<sup>1</sup> . . . α<sup>m</sup> <sup>a</sup>−→ <sup>E</sup> we have the labelled transition X E<sup>1</sup> . . . E<sup>m</sup> a−→ E[E1/α1, . . . ,Em/αm].

Let ≈ denote bisimilarity over closed expressions according to a first-order grammar. We now present a fully abstract (i.e., preserving bisimilarity) translation of a deterministic first-order grammar into a type in F µ <sup>ω</sup> . Each grammar variable α has a corresponding type variable α (of kind t). An expression X E<sup>1</sup> . . . E<sup>m</sup> is represented as a type application X E<sup>1</sup> . . . Em. If X has arity m and the productions X α<sup>1</sup> . . . α<sup>m</sup> aj −→ E<sup>j</sup> for a range of j, then we write the equation specifying X as a record (since the first-order grammar is deterministic, all record labels are distinct, and thus the right-hand side on the equation specifying X is well-formed).

$$X \doteq \lambda \alpha\_1 \colon \stackrel{\circ}{\mathbf{T}} \dots \lambda \alpha\_m \colon \mathbf{T} . \{ a\_1 \colon E\_1, \dots, a\_m \colon E\_m \} $$

This gives rise to a system of equations {X<sup>i</sup> .<sup>=</sup> <sup>T</sup>i}, one for each nonterminal <sup>X</sup><sup>i</sup> , where the nonterminals may appear in the right-hand sides T<sup>i</sup> . Finally, given an initial expression E, it is standard how to convert it into a µ-type using the system above.

Using the above translation, we are able to simulate a transition E aj −→ F of the first-order grammar as a transition E {ai}<sup>j</sup> −→ F on the corresponding types. Therefore, the translation is fully abstract and we get the following result.

Theorem 5. Let E and F be closed expressions on a first-order grammar and E, F the corresponding types. Then E ≈ F iff E ∼ F.

Let us work on an example to better understand the above translation. Consider the language L<sup>3</sup> = {` <sup>n</sup>arn<sup>a</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>} ∪ {` <sup>n</sup>brn<sup>b</sup> <sup>|</sup> <sup>n</sup> <sup>≥</sup> <sup>0</sup>} over the alphabet {a, b, `, r}. L<sup>3</sup> is a typical example of a language that cannot be described with a simple grammar, but can be accepted by a deterministic pushdown automaton [51]. Consider the first-order grammar with nonterminals X, R, A, B, ⊥, initial expression X A B, and productions

$$\begin{array}{c} X \ \alpha \ \beta \ \stackrel{\ell}{\longrightarrow} X \ \begin{array}{c} \begin{array}{c} \begin{array}{c} R \ \alpha \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \ \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} X \ \alpha \ \beta \ \stackrel{a}{\longrightarrow} \begin{array}{c} \begin{array}{c} X \ \alpha \ \beta \ \stackrel{b}{\longrightarrow} \begin{array}{c} \beta \ \beta \end{array} \end{array} \end{array} \end{array} \end{array} \end{array}$$

Note that ⊥ is a constant without productions. It is easy to see that the traces of this first-order grammar correspond exactly to the words in L3. By following the steps in the above translation, we arrive at the system of equations

$$\begin{aligned} X &\doteq \lambda \alpha \colon \text{tr}\,\lambda \beta \colon \text{tr}\,\{\ell \colon X(R\alpha)(R\beta), a \colon \alpha, b \colon \beta\} &\quad R \doteq \lambda \alpha \colon \text{tr}\,\{r \colon \alpha\} \\ A &\doteq \{a \colon \bot\} &\quad B \doteq \{b \colon \bot\} &\quad \bot \doteq \{\} \end{aligned}$$

Therefore, the initial expression X A B becomes the type

$$(\mu \,\xi \colon \mathbb{T} \Rightarrow \mathbb{T} \Rightarrow \mathbb{T} \,\lambda \alpha \colon \mathbb{T} \,\lambda \beta \colon \mathbb{T} \{\ell \colon \xi \{r \colon \alpha\} \{r \colon \beta\} \,a \colon \alpha, b \colon \beta\}) \{a \colon \{\}\} \{b \colon \{\}\},$$

whose transitions simulate the transitions of the first-order grammar.

```
v ::= c | x | λx: T.t | rec x: T.v | Λα: κ.v | {li = vi} | hl = vi as T
       receive[T] | receive[T][T] | send[T] | send[T] v | send[T] v[T]
  t ::= v | tt | t[T] | {li = ti} | let {li = xi} = tin t
       hl = ti as T | case t of t | match t with t
  p ::= hti | p | p | (νxx)p
c ::= Term constant
    receive ∀α: t. ∀β : s. ?α.β → α ⊗ β receive on a channel
    send ∀α: t. α → ∀β : s. !α.β → β send on a channel
    selectlj as ⊕{li : Ti} ⊕{li : Ti} → Tj internal choice
    close End → Unit channel close
    fork (Unit → Unit) → Unit fork a new thread
    new ∀α: s. a → α ⊗ Dual α channel creation
```
Fig. 10: Terms and types for term constants.

# 6 The term language and its metatheory

This section briefly introduces a concurrent functional language equipped with F µ∗; <sup>ω</sup> types, together with its metatheory. The results mostly follow from those in the literature, although explicit recursion at the term level and the unrestricted bindings in typing contexts are somewhat new in session types. The complete set of rules is to be found in the technical report [20].

The syntax of terms and processes is defined by the grammar in Fig. 10. The same figure introduces types for the constants. The term language is essentially the polymorphic lambda calculus with support for session operators, formulated as in Almeida et al. and Cai et al. [2,14]. From System F it comprises terms and type abstractions, records and variants, including constructors and destructors in each case. The support for session operations and concurrency includes channel creation (new), the different channel operations (receive, send, match, select and close) and thread creation (fork). We program at the term level and use processes only for the runtime. Processes include terms as threads, parallel composition and channel creation, all inspired in the pi-calculus with double binders [73].

Process typing and an excerpt of term typing is in Fig. 11. A judgement of the form ∆ | Γ ` t: T records the fact that term t has type T under contexts ∆ (recording kinds for type variables) and Γ (recording types for term variables). The judgement for processes, Γ ` p, says that p is well-typed under context Γ. It simplifies that for terms, since processes feature no free type variables and are assigned no particular type. Once again, the rules are adapted from the two above cited works. The difference to Cai et al. is that we work in a linear setting and hence axioms (T-Const and T-Var) work on an empty context, and most of the other rules must split the context accordingly. Rule T-TAbs simplifies Γ ` hti

Term typing ∆ | Γ ` t: T

Γ ` (νxy)p

T-Const ∆ ` T<sup>c</sup> : ∗ ∆ | · ` c : T<sup>c</sup> T-Var ∆ | x: T ` x: T T-App ∆ | Γ<sup>1</sup> ` t<sup>1</sup> : U → T ∆ | Γ<sup>2</sup> ` t<sup>2</sup> : U ∆ | Γ1, Γ<sup>2</sup> ` t<sup>1</sup> t<sup>2</sup> : T T-Rec ∆ ` T : ∗ ∆ | Γ, x: ω T → U ` v : T → U ∆ | Γ ` rec x: T → U.v : T → U T-TAbs ∆, α: κ | Γ ` v : T α ∆ | Γ ` (Λα: κ.v): ∀<sup>κ</sup> T T-Match ∆ | Γ<sup>1</sup> ` t<sup>1</sup> : &{l<sup>i</sup> : Ti} ∆ | Γ<sup>2</sup> ` t<sup>2</sup> : {l<sup>i</sup> : T<sup>i</sup> → T} ∆ | Γ1, Γ<sup>2</sup> ` match t<sup>1</sup> with t<sup>2</sup> : T T-Eq ∆ | Γ ` t: U ∆ ` U : ∗ U ∼ T ∆ | Γ ` t: T T-Dereliction ∆ | Γ, x: T ` t: U ∆ | Γ, x: ω T ` t: U T-Weakening ∆ | Γ ` t: U ∆ | Γ, x: ω T ` t: U T-Contraction ∆ | Γ, y : ω T, z : ω T ` t: U ∆ | Γ, x: ω T ` t[x/y][x/z]: U Process typing Γ ` p ε | Γ ` t: Unit Γ<sup>1</sup> ` p<sup>1</sup> Γ<sup>2</sup> ` p<sup>2</sup> Γ, x: T, y : Dual T ` p

Fig. 11: Typing (excerpt).

Γ1, Γ<sup>2</sup> ` p<sup>1</sup> | p<sup>2</sup>

that of Cai et al.; we can easily show that both rules are interchangeable. We support exponentials [37] for recursive functions, so that one may write functions that feature more than one recursive call (good for consuming binary trees, for example) and branches that do not use the recursive function (for code that is supposed to terminate). Towards this end, we add an unrestricted binding x: <sup>ω</sup> T in term variable contexts, an explicit rule for rec (as opposed to making rec a constant as in Cai et al. [14]) and substructural rules for unrestricted bindings (T-Dereliction, T-Weakening and T-Contraction).

Thanks to the power of System F, most of the session and concurrency operators are expressed as constants. For example, receive receives a session type !α.β with α, the payload of the message, an arbitrary type and β, the continuation, a session type, and returns a pair of the value received and the continuation channel. As usual ∀α: κ. T abbreviates the type ∀<sup>κ</sup> (λα: κ.T). The exception is the external choice (T-Match) which can not be captured by a type (similarly to T-Case) and hence requires a dedicated typing rule.

Process reduction is in Fig. 12. Following Milner [55] we factor out processes by means of a structural congruence relation that accounts for the associative and commutative nature of parallel composition, scope extrusion and exchanging the order of channel bindings.We now address the metatheory of our language, starting with preservation for both terms and processes.

Process reduction p → p

$$\begin{array}{llll} \frac{t\_1 \to t\_2}{\langle t\_1 \rangle \to \langle t\_2 \rangle} & \langle E[\text{for} \,\text{w}] \rangle \to \langle E[\{\}] \rangle \mid \langle \,\text{w} \,\{\} \rangle & \langle E[\text{nove}[T]] \rangle \to \langle \nu xy \rangle \langle E[\{x,y\}] \rangle \\\\ \langle \nu xy \rangle (\langle E\_1[\text{reive}[T][U] \, y] \rangle \mid \langle E\_2[\text{send}[V][W] \, v \, x] \rangle) & \rightarrow \langle \nu xy \rangle (\langle E\_1[\{y,v\}] \rangle \mid \langle E\_2[x] \rangle) \\\\ \langle \nu xy \rangle (\langle E\_1[\text{match} \, y \, \text{with} \,\{l\_i = t\_i\}] \rangle \mid \langle E\_2[\text{select} \, l\_j \, \text{as } T \, x] \rangle) & \rightarrow \langle \nu xy \rangle \langle E\_1[t\_j \, y] \rangle \mid \langle E\_2[x] \rangle \\\\ \langle \nu xy \rangle (\langle E\_1[\text{close} \, y] \rangle \mid \langle E\_2[\text{close} \, x] \rangle) & \rightarrow \langle E\_1[\{j\}] \mid \langle E\_2[\{j\} \rangle \rangle \\\\ \end{array}$$

$$\begin{array}{ccccc} p\_1 \rightarrow p\_2 & & & & p\_1 \equiv p\_2 & & p\_2 \rightarrow p\_3 & & p\_3 \equiv p\_4 \\ \hline (\nu xy) p\_1 \rightarrow (\nu xy) p\_2 & & & & & & & p\_1 \rightarrow p\_4 & \\ \end{array}$$

Fig. 12: Process reduction.

#### Theorem 6 (Preservation).


Progress for the term language is assured only when the typing context contains channel endpoints only. When ∆ is understood from the context we write Γ s to mean that <sup>Γ</sup> contains only types of kind <sup>s</sup>, that is <sup>∆</sup> ` <sup>T</sup> : <sup>s</sup> for all types T in Γ. Well typed terms are values, or else they may reduce or are ready to reduce at the process level. Reduction in the case of session operations—receive, send, match, select, close—is pending a matching counterpart.

Theorem 7 (Progress for the term language). If ∆ | Γ <sup>s</sup> ` <sup>t</sup>: <sup>T</sup>, then <sup>t</sup> is a value, t reduces, or t is stuck in one of the following forms: E[fork v], E[new[T]], E[receive[T][U] v ], E[send[U] T[v] x], E[match y with {l<sup>i</sup> = ti}], E[(selectl<sup>j</sup> as T) x], or E[close x].

In order to state our result on the absence of runtime errors we need a few notions on the structure of terms and processes; here we follow Almeida et al. [2]. The subject of an expression e, denoted by subj(e), is x in the following cases.

receive[T][U] x send[T] v[U] x match xwith t (selectl<sup>j</sup> as T) x close x

Two terms e<sup>1</sup> and e<sup>2</sup> agree on channel xy, notation agreexy(e1, e2), in the following cases (symmetric forms omitted).

$$\begin{aligned} \text{agree}^{xy}(\text{receive}[T][U] \, x, \text{send}[V] \, v[W] \, y) & \text{agree}^{xy}(\text{close} \, x, \text{close} \, y) \\ \text{agree}^{xy}(\text{match} \, x \, \text{with} \, \{\overline{l\_i = t\_i}\}\_{i \in I}, (\text{select} \, l\_j \text{ as } T) \, y) & \text{ } j \in I \end{aligned}$$

A closed process is a runtime error if it is structurally congruent to some process that contains a subexpression or subprocess of one of the following forms.


The four cases are standard to system F with records and variants. The support for session types and concurrency in the first two cases (term and type application) are derived from the types of values for such operators (Fig. 10). Item 5 addresses session operators applied to non endpoints. Item 6 is for two concurrent session operators on the same channel end. Finally, Item 7 is for mismatches on two session operations on two endpoints for the same channel.

#### Theorem 8 (Safety). If Γ <sup>s</sup> ` <sup>p</sup>, then <sup>p</sup> is not a runtime error.

An algorithmic typing system can be easily extracted from the declarative system for terms in Fig. 11 via a bidirectional type system, formulated along the lines of Almeida et al. [2].

# 7 Related Work

Equirecursion in system F. In first investigations on equirecursive types, the notion of type equivalence is often formulated in a coinductive fashion [5,11,18,29,38]. Two types are equivalent if they unroll into the same infinite tree. Whenever this unrolling is the only type-level computation, such trees are regular, enabling efficient decision procedures. Some authors have studied equirecursion together with other notions of type-level computation. Solomon considers parameterized type definitions, which correspond to higher-order kinds [63]. These implicitly correspond to λ-terms, since reduction occurs as types are allowed to call other types. Some authors consider equirecursion in system Fω, with weaker or stronger notions of equality [1,12,14,41]. Regarding equirecursion in system F, the model of Cai et al. [14] is the closest to ours, and indeed our results up to F µ∗· <sup>ω</sup> can be seen as a generalisation of theirs. However, Cai et al. depart from the usual setting by allowing non-contractive types (which most authors forbid, including this work), requiring a sort of infinitary lambda calculus. Moreover, this work further extends additional equivalence properties by including session types with their distinctive semantics, such as sequential composition and duality.

Session type systems. Session types were introduced in the 90s by Honda et al. [42,43,67]. Equirecursion was the first approach used to construct infinite session types, which often allows type equality to be interpreted according to a coinductive notion of bisimulation [52]. In this vein, Keizer et al. [48] utilize coalgebras to represent session types. Since the inception of session types, there has been an interest in extending the theory to nonregular protocols [58,59,66]. Context-free session types emerged as a natural extension, as it still allowed for practical type equality algorithms [3,4,19,28,56,68]. Other approaches that go beyond regular session types include nested session types [24] as well as 1-counter, pushdown and 2-counter session types [33]. However, the more expressive notions are not amenable to practical type equivalence algorithms, just like the higherorder types present in our system F µ <sup>ω</sup> . Polymorphism in session types has also been a topic of interest, with or without recursion [15,22,23,31,39].

Dual type operator. This work is, to the best of our knowledge, the first that internalises duality as a type constructor. Other settings, such as the language Alms [72], consider duality for session types as a user-definable, not built in, type function. Our Dual is a type operator, not a type function. The difference is that a type function involves a type-level computation, which converges to a type written without dual. For example, in Alms we would have dual(!Int.End) = ?Int.End (as a type-level computation), both sides being the same type. In our setting, Dual(!Int; End) is a type on its own, which happens to be equivalent to ?Int; End. At the same time, our setting allows for types such as Dual α, or (Dual α); T1; T2, which do not reduce.

Type equivalence algorithms. Algorithms for deciding the equivalence of types must inherently be related to the computational power of the corresponding type system. This has been used implicitly or explicitly to obtain decidability results. As already explained, if equirecursion is the only type-level computation, types can be represented as finite-state automata (or equivalently, infinite regular trees). Although some exponential time algorithms were first proposed [32], it has been established that the problem can be solved in quadratic time [53], which is to be expected as it matches the corresponding problem of bisimulation of finite-state automata [44]; see also Pierce [57].

The next 'simplest' model of computation is that of simple grammars, which intuitively correspond to deterministic pushdown automata with a single state [33]. Almeida et al. [4] provided a practical algorithm for checking the bisimilarity of simple grammars. By dropping the determinism assumption, we arrive at Greibach normal form grammars, which are equivalent to basic process algebras [6,7]. Bisimilarity algorithms have been studied extensively in this setting [13,17,47,49]; presently it is known that the complexity of the problem lies between EXPTIME and 2-EXPTIME, which does not exclude the possibility of a polynomial time algorithm for the simpler model of simple grammars.

In this paper we present a reduction from first-order grammars to F µ <sup>ω</sup> -types, showing that the more expressive type systems (F µ <sup>ω</sup> , presented here and in Cai et al. [14], as well as its extensions) are at least as powerful as deterministic pushdown automata. As far as we know, the closest result to ours is by Solomon [63], which shows conversions between a universe of "context-free types" and deterministic context-free languages. The universe of types studied by Solomon is different from F µ <sup>ω</sup> . With some work we could prove that Solomon's types can be embedded into F µ <sup>ω</sup> , which would entail our result as a corollary. However, it is easier and simpler to prove directly the reduction as we did.

The equivalence problem for deterministic pushdown automata was a notorious open problem for a long time, until S´enizergues showed it to be decidable [61,62]. Since his proof, many authors have tried to refine the result in an attempt to arrive at an implementable algorithm [46,64,65].

Concurrent term languages. The usefulness of a type system is directly related to its capability to be used in a programming language. Type systems such as the ones discussed in this work lend themselves quite readily to functional term languages [45]. For session types, existing term languages are either inspired in the pi calculus [26,73,69] or in the lambda calculus [35,54,70], or even the two [71]. The system presented in this paper is linear, meaning that resources must be used exactly once [50,74]. Some authors go beyond linearity by considering unrestricted type qualifiers [48,73] or manifest sharing [8].

# 8 Conclusion and future work

This paper introduces an extension of system F which includes equirecursion, lambda abstractions, and context-free session types. We present type equivalence algorithms, and a term language and its metatheory. Although we have defined a rather general system, it turns out that for practical purposes one must restrict recursion to µ∗, that is, to type-level monomorphic recursion. In any case, the main system F µ∗; <sup>ω</sup> is a non-trivial extension of (the contractive fragment of) F µ<sup>∗</sup> ω (studied by Cai et al. [14]) as well as F µ; (studied by Almeida et al. [19]).

We have only considered polymorphic types of a functional nature: type <sup>∀</sup>α: κ. T must always be of kind <sup>t</sup>. It is worth investigating polymorphism over session types, as it would allow further additional behaviour. For example, we could be interested in streaming values of heterogeneous nature, as in type <sup>µ</sup> <sup>α</sup>: <sup>s</sup>. &{Done : Skip, More : <sup>∀</sup><sup>β</sup> : <sup>t</sup>. ?β; <sup>α</sup>}. It is however unclear whether this extension would still allow a translation into a simple grammar.

We proved that the type equivalence problem for systems F µ <sup>ω</sup> , F µ· <sup>ω</sup> , F µ; <sup>ω</sup> is at least as hard as a non-efficiently-decidable problem. We conjecture that these systems have the same power as deterministic pushdown automata (and hence, admit decidable type equivalence), but we do not have a construction to prove this result. In any case, our proof that the type equivalence problem is at least as hard as the bisimilarity of deterministic pushdown automata is enough to justify focus on the significant fragment with restricted recursion.

We study either full recursion (for theoretical results) or recursion limited to kind ∗ (for algorithmic results). It would be interesting to study in-between kinds of recursion; the next natural example is µ∗⇒∗. What model of computation would we arrive at if we consider types written with this recursion operator? We conjecture that types F µ <sup>ω</sup> and F µ· <sup>ω</sup> , when restricted to recursion of kind ∗ ⇒ ∗, would still be expressible as simple grammars, whereas such a restriction in the more powerful F µ; <sup>ω</sup> would take us beyond this model, but perhaps without reaching the expressivity of deterministic pushdown automata.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Safe Session-Based Concurrency with Shared Linear State

Pedro Rocha(B) and Lu´ıs Caires

NOVA LINCS, NOVA University of Lisbon, Portugal pms.rocha@campus.fct.unl.pt lcaires@fct.unl.pt

We introduce CLASS, a session-typed, higher-order, core language that supports concurrent computation with shared linear state. We believe that CLASS is the frst proposal for a foundational language able to fexibly express realistic concurrent programming idioms, with a type system ensuring all the following three key properties: CLASS programs never misuse or leak stateful resources or memory, they never deadlock, and they always terminate. CLASS owes these strong properties to a propositions-as-types foundation based on Linear Logic, which we conservatively extend with logically motivated constructs for shareable afne state. We illustrate CLASS expressiveness with several examples involving memory-efcient linked data structures, sharing of resources with linear usage protocols, and sophisticated thread synchronisation, which may be type-checked with a perhaps surprisingly light type annotation burden.

# 1 Introduction

Stateful programming involving concurrency and shared state plays a prominent role in modern software development, but, in practice, getting concurrent code right is still quite hard for common developers. Typical sources of "bugs" include resource leaks (forgetting to release unused memory or close a socket), violation of resource state preconditions (writing to a closed fle or sending out-of-order messages), races (data invariant breaking, erratic sharing of resources), deadlocks (indefnite wait for lock release or incoming messages), livelocks, and even general non-termination. Fifty years ago Hoare noted [40]: "Parallel programs are particularly prone to time-dependent errors, which either cannot be detected by program testing nor by run-time checks. It is therefore very important that a high-level language designed for this purpose should provide complete security against time-dependent errors by means of a compile-time check". It does not come as a surprise that fnding ways to approximate such certainly very ambitious goal is still today the object of exciting intense research.

In this paper, we approach this challenge by leveraging the propositionsas-types (PaT) paradigm towards the realm of concurrency and shared state. PaT is known to ofer a unifying framework connecting logic, computation, and programming languages. Since the seminal work of Curry and Howard [42], it is a prolifc structuring concept for designing and reasoning about programming languages (see [82]). Remarkably, languages derived within PaT intrinsically satisfy crucial properties: type preservation (since reduction corresponds to cutreduction), confuence (since computation corresponds to proof simplifcation),

deadlock freedom (as a consequence of cut-elimination) and livelock freedom / termination (as a consequence of strong normalisation).

Although PaT has a traditional focus on functional computation, the emergence of linear logic has progressively motivated interpretations of stateful/resourceful computation [78,1,14,2,12], eventually leading to the discovery of tight correspondences between session types and linear logic [22,27,81]. These systems already capture aspects of state change, namely in the sequential execution of session protocols, thus raising the question of whether such approaches could be extended to express notions of shared mutable state, subject to interference, as found in typical imperative and concurrent programs. Recently, such challenge was addressed by several works [9,64,67]. In particular, [67] developed a frst basic shared state model enjoying all the aforementioned strong properties of PaT. However, although [67] supports higher-order shareable store for pure values of replicated type, it forbids linear objects, such as stateful processes or data structures with update in-place, to be stored and shared as in languages like Java, Rust, and in the CLASS core language we introduce herein.

In this work, we develop a novel, more fundamental approach to shared state and PaT, and introduce CLASS, a typed, higher-order, session based core language that supports general concurrent computation with dynamically allocated shared linear (more precisely, afne) state. We believe that CLASS is the frst proposal for a foundational language. able to fexibly express realistic concurrent programming idioms, while ensuring all the following three key properties by static typing: CLASS programs never misuse or leak stateful resources or memory, they never deadlock, and they always terminate.

Despite the strength of its type system, CLASS expressiveness and efectiveness substantially overcomes limitations of related works, as we show with compelling program examples that can be algorithmically typed for memory safety, dead- and live-lock freedom with a perhaps surprisingly light type annotation burden. CLASS owes these strong properties to is PaT foundation based on Second-Order Linear Logic, already known to capture the polymorphic session calculus and the linear System F [74], but which we conservatively extend with novel logically motivated constructs for shareable afne state, also based on DiLL co-exponentials [35,67], but to which we give here a diferent, more general and fundamental interpretation.

#### 1.1 Overview

A main novelty and source of CLASS's expressiveness, fexibility and strong meta-theoretical properties resides in its mechanism for shared state composition. It is interesting to overview such mechanism in the context of the basic composition and interaction principles of the fundamental linear logic interpretations [22,27,81]. Our computational model is structured around processes that interact via binary sessions, the basic composition rules being mix and cut.

$$\frac{P \vdash \Delta\_1; \Gamma \quad Q \vdash \Delta\_2; \Gamma}{P \mid \mid Q \vdash \Delta\_1, \Delta\_2; \Gamma} \text{ [Tmix]} \quad \frac{P \vdash \Delta\_1, x: A; \Gamma \quad Q \vdash \Delta\_2, x: \overline{A}; \Gamma}{P \mid x \mid Q \vdash \Delta\_1, \Delta\_2; \Gamma} \text{ [Tcut]}$$

The mix rule types the independent composition of processes P and Q, which do not share any free names and run side-by-side without interacting. This is captured by the implicit disjointness of their linear typing contexts ∆<sup>1</sup> and ∆2, declaring the types of their interaction channels. Interactive composition is expressed by the cut rule, which connects exactly two processes P and Q through a single linear session x with dual typed endpoints (x : A and x : A), following Abramsky's idea of "cut as interactive composition" [1].

Intuitively, duality of endpoint (session) types ensures that all interactions between P and Q on x always matches: when P sends, Q receives; when Q ofers, P chooses; and likewise for all types. Notice that sharing a single channel x between the threads P and Q is important to ensure acyclicity of proof structures, and cut-elimination/deadlock absence. But P, Q may use an arbitrary number of linear channels, in ∆1, ∆2, to also compose with other processes.

Shared composition in session types is available for replicated "server" objects !x(y); P, typed by the linear logic exponential type bang !A. Contraction of the dual exponential type why-not ?A allows an unbounded number of usages of such replicated server object to be introduced in client processes. In the dyadic presentation of linear logic (cf. [5,11]), contraction is expressed by moving ? typed names into the unrestricted context Γ, with the [T?] rule.

!x(y); P ⊢ x :!A; Γ Q ⊢ ∆; Γ, x : A [T?] ?x; Q ⊢ ∆, x :?A; Γ !x(y); P |x| ?x; Q ⊢ ∆; Γ . . . R ⊢ ∆, y : A; Γ, x:A [Tcall] call x(y); R ⊢ ∆; Γ, x:A

Names in Γ may be used unrestrictedly; each call (typed by [Tcall]) spawns a fresh copy of the server body at type y : A, to be used by the client at type y : A, in a linear binary session. By the typing rule for !A (promotion) such copy does not depend on linear resources. Thus, interaction with replicated objects as captured by the exponentials !A and ?A implements a copy semantics where each call obtains a new private stateless copy of the same object.

In this work, we introduce a third composition mechanism, allowing processes to concurrently share mutex memory cells, storing linear state. Mutex memory cells and their usages are typed respectively by a pair of dual modalities S•A and U•A, whose logical rules are motivated by Diferential Linear Logic (DiLL) [35], in particular cocontraction, expressed by the type rule [Tsh].

$$\frac{P \vdash \Delta, x: \mathsf{U}\_{\bullet}A; F \qquad Q \vdash \Delta', x: \mathsf{U}\_{\bullet}A; F}{\text{share } x\ \{P \mid \mid Q\} \vdash \Delta, \Delta', x: \mathsf{U}\_{\bullet}A; F} \text{ [Tsh]}$$

While sharing of replicated objects corresponds to contraction of ?A types, shared usage of mutex cells corresponds to cocontraction of U•A types. Apart from the explicit use of [Tsh], the type system ensures that memory cells are always used linearly. The shared usage x : U•A is free in the conclusion of the typing rule, therefore a memory cell may be shared by an arbitrary number of processes, by nested iterated use of cocontraction.

Moreover, cocontraction also ensures that concurrent processes may share a single mutex cell (just like [Tcut] w.r.t. binary sessions). This constraint comes from the linear logic discipline, and it is important to ensure deadlock freedom. As discussed in Concluding Remarks, this does not hinder CLASS expressiveness - e.g., a single mutex cell may act as a gateway to further bundles of shared state, organised in resource hierarchies, as our examples illustrate - and even suggests convenient concurrent programming structuring techniques.

To access a mutex memory cell in its (unlocked) full state, typed by U•A, the client uses a take operation. Take waits for acquiring the cell lock and reads its contents. The cell then transitions to the (locked) empty state, typed by U◦A. The taking client becomes the sole responsible for flling back the cell contents, using a put operation. This will restore the cell to the full state, releasing its lock, and making it accessible to other concurrent threads waiting to take it. Our mutex memory cell object is thus akin to a behaviourally typed incarnation of Concurrent Haskell MVars [45] or Rust std::sync::Mutex objects [46].

To ensure safe releasing of a memory cell, its contents are required to be of afne type ∧A. Afne objects are well-behaved disposable values, that when discarded, safely dispose all resources they hereditarily refer to, this being ensured by the linear logic typing.

We illustrate the introduced concepts with a simple example, where two concurrent threads compete to set on an initially of fag, but only one may win. The fag iteratively announces its state to the client with either #Of or #On. If the state is of, the client must select #turnOn, if the state is on, it will remain on. Process fag(f) implements the fag (at name f) in the of state, and process on(f) in the on state, defned thus

$$\begin{array}{l} \mathsf{flag}(f) = \#\mathsf{Off}\ f; \mathsf{case}\ f\{ \mid \#\mathsf{turn}\mathsf{On}:\mathsf{affine}\ f; \mathsf{on}(f) \ \} \\ \mathsf{on}(f) \ = \#\mathsf{On}\ f; \mathsf{affine}\ f; \mathsf{on}(f) \end{array}$$

The fag object is typed with the (linear) usage protocol defned by the coinductive type Flag below, such that fag(f) ⊢ f : Flag and on(f) ⊢ f : Flag

type corec Flag <sup>=</sup> ⊕{ |#Of : <sup>N</sup>{ |#turnOn : <sup>∧</sup>Flag}, <sup>|</sup>#On : <sup>∧</sup>Flag}

We now consider a scenario where a fag object is shared via a mutex memory cell c initially storing a of fag of type ∧Flag among two concurrent clients.

client(c, id) ⊢ c : U•Flag; id : int client(c, id) = take c(f); case f { |#Of : println id + ": wins.'; #turnOn f; put c(f);release c |#On : println id + ": loses.'; put c(f);release c } main() ⊢ ∅ main() = cut { cell c(f.afne f; fag(f)) |c : U•Flag| share c { client(c, 1) || client(c, 2) } }

When running main() exactly one of the threads (executing the same code, just with a diferent id) will turn the fag on and win, the other will loose. Notice that all threads drop usage of the memory cell c using release, which corresponds to DiLL coweakening ([35]).

When considering a new language, in particular with a static typing discipline, it is necessary to argue about its expressiveness, and aim for a better perception of how natural programs get past its typing rules, and of how types help in structuring programs. In this paper, we approach these concerns by showcasing many interesting examples that challenge the expressiveness of the CLASS language and type system on realistic concurrent programming scenarios. We have developed many more examples, distributed with our implementation [68], combining imperative, higher-order functional, and session-based programming styles. For all these programs, strong guarantees of memory safety, deadlockfreedom, termination, and absence of "dynamic bugs", even in the presence of blocking primitives and higher-order state, are compositionally certifed by our lightweight type discipline based on Propositions-as-Types and Linear Logic.

#### 1.2 Outline and Contributions

We believe that CLASS is the frst proposal for a foundational language able to fexibly express realistic concurrent programming idioms while ensuring by typing three key properties: CLASS programs never misuse or leak stateful resources or memory, they never deadlock, and they always terminate.

In Section 2 we formally present the core language CLASS, its type system and operational semantics. Our model builds on the propositions-as-types approach to session-based concurrency [22,27,80], extending Second-Order Classical Linear Logic with inductive/coinductive types, afne types, and novel primitives for shareable frst-class mutex reference cells for linear state.

In Section 3 we state and prove type preservation (Theorem 1), progress (Theorem 2) which implies deadlock-freedom, and strong normalisation (Theorem 3), which also implies livelock absence. Our proof uses a logical relations argument, extended with an interesting technique to handle shared state interference, which we believe is exploited here for the frst time.

Given the strong properties of its type system, it is of course very important to substantiate our claims about CLASS expressiveness. In Section 4 we illustrate the expressiveness of CLASS language and type system by going through a series of compelling examples. Namely, we discuss a general technique for sharing linear protocols, a shareable linked list with update in-place, a shareable bufered channel, using a linked list with pointers to tail and head nodes, and executing send and receive operations in O(1) time; the dining philosophers, illustrating techniques that rely on our type structure to encode resource acquisition hierarchies; a generic barrier for n threads; and a Hoare style monitor with await/notify conditions, where our implementation of the condition's process queue is supported by a dynamic linked data structure, as in real systems code.

Section 5 discusses related work. Section 6 ofers concluding remarks and suggests further research. Complete defnitions and detailed proofs to all results are provided in [65].

# 2 The Core Language and its Type System

We present the core language, type system, and operational semantics of CLASS. The language is based on a PaT correspondence with Linear Logic, so terms of the language correspond to proof rules. We start by types and duality.

Defnition 1 (Types). Types A, B of CLASS are defned by

$$A, B ::= X \mid \mathbf{1} \quad \mid \perp \quad \mid A \otimes B \mid A \oplus B \mid A \otimes B \mid A \otimes B$$

$$\begin{array}{c|c|c|c|c|c|c|c|c} \mid \mathbf{1}A & \mid \mathbf{?}A & \mid \exists X.A \mid \mid \forall X.A & \mid \mu X.A & \mid \nu X.A \\ \mid \land A & \mid \lor A & \mid \mathbf{S\_{\bullet}}A & \mid \mathbf{S\_{\bullet}}A & \mid \mathbf{U\_{\bullet}}A & \mid \mathbf{U\_{\circ}}A \end{array}$$

Types in the frst two rows correspond to Second-Order Classical Linear Logic, extended with inductive/coinductive types (µ, ν). Types comprise variables (X), units (1, <sup>⊥</sup>), multiplicatives (⊗, <sup>O</sup>), additives (⊕, <sup>N</sup>), exponentials (!, ?) and quantifers (∃, ∀). The third row extends basic types with afne (∧, ∨) and new modalities (S•,U•, S◦,U◦) to type shared afne state. Duality is the involution operation A 7→ A on types, corresponding to Linear Logic negation, defned by

$$\begin{array}{llll} \overline{\mathsf{T}} &= \bot & \overline{A \otimes B} = \overline{A} \otimes \overline{B} & \overline{A \oplus B} = \overline{A} \otimes \overline{B} \\ \overline{\wedge A} &= \overline{\vee B} & \overline{\overline{\otimes} X.A} &= \mathsf{U} \overline{X} \overline{A} & \overline{\mu X.A} = \nu X. \{ \overline{X}/X \} (\overline{A}) \\ \end{array}$$

Duality captures symmetry in process interaction, as manifest in the cut rule. In our system, typing judgements have the form P ⊢<sup>η</sup> ∆; Γ. The typing context ∆; Γ is dyadic [4,15,63,22], where ∆ is handled linearly and Γ is unrestricted; both ∆ and Γ assign types to names. The index η is a fnite map that holds coinduction hypothesis to type corecursive processes, as detailed later.

Defnition 2. The typing rules of CLASS are presented in Figs. 1 to 5.

The type system corresponds, via propositions-as-types [22,27,80], to Second-Order Classical Linear Logic (Fig. 1) with inductive/coinductive types (Fig. 2), afnity (Fig. 3) and extended with constructs for shared mutable state (Figs. 4 - 5). The basic composition rules are [Tmix] and [Tcut], which correspond to mix and cut of Linear Logic, respectively. [Tmix] types a parallel composition P || Q, where P and Q run in parallel without interfering. On the other hand, [Tcut] types linear interactive composition P |x : A| Q: processes P and Q run concurrently and communicate through a private linear session x, session endpoints being typed by dual types A/A. When the cut type annotation does not play any role, we may omit it and write P |x| Q. In examples, for readability, we use cut {P |x| Q} and par {P || Q} instead of P |x| Q and P || Q, respectively.

For the basic process constructs [22,27,80,19], <sup>⊗</sup>/<sup>O</sup> type send and receive, <sup>⊕</sup>/<sup>N</sup> type choice and ofer (in examples we use labelled choice) , !/? type

$$\begin{array}{c} \begin{array}{|c|c|c|} \hline \hline 0 \vdash\_{\eta} \theta \mathrel{\!} \mathrel{\!} \text{T} [0] & \text{ $!}^{P} \vdash\_{\eta} \Delta \mathrel{\!} \text{T} \quad Q \vdash\_{\eta} \Delta \mathrel{\!} \text{T} \quad \text{[Tmix]} \\ \hline \hline \end{array} \\ \begin{array}{|c|c|} \hline \hline \text{$ \!} \text{Tw} \mathrel{\!} \mathrel{\!} \text{Tw} & \text{ $!}^{P} \mathrel{\!} \text{T} \quad Q \vdash\_{\eta} \Delta \mathrel{\!} \text{A} \mathrel{\!} \text{T} \\ \hline \end{array} \\ \begin{array}{|c|c|} \hline \text{$ \!} \text{Cw} \mathrel{\!} x \mathrel{\!} \mathrel{\!} \text{A} \quad \text{ $!}^{P} \text{A} \mathrel{\!} \text{A} \mathrel{\!} \text{A} \mathrel{\!} \text{T} \quad Q \vdash\_{\eta} \Delta \mathrel{\!} \text{A} \mathrel{\!} \text{T} \quad \text{[Tmt]} \\ \hline \end{array} \\ \begin{array}{|c|c|} \hline \text{$ \!} \text{Tw} \mathrel{\!} \tau \mathrel{\!} \tau \mathrel{\!} \text{T} \quad \text{ $!}^{P} \text{A} \mathrel{\!} \text{T} \quad Q \vdash\_{\eta} \Delta \mathrel{\!} \text{A} \mathrel{\!} \text{T} \quad \text{[Tm]} \\ \hline \end{array} \\ \begin{array}{|c|c|} \hline \text{$ \!} \text{T} \mathrel{\!} \tau \mathrel{\!} \text{A} \quad \text{ $!}^{P} \text{A} \mathrel{\!} \text{A} \quad \text{$ !}^{P} \text{A} \mathrel{\!} \text{A} \text$$


$$\frac{P \vdash\_{\eta'} \Delta, z:A; \Gamma \quad \eta' = \eta, X(z, w) \mapsto \Delta, z:Y; \Gamma}{\text{core}\ X(z, w); P\ [x, y] \vdash\_{\eta} \{y/w\} \Delta, x:\nu Y.\ A; \{y/w\} \Gamma} \text{ [Torecec]}$$

$$\frac{\eta = \eta', X(x, y) \mapsto \Delta, x:Y; \Gamma}{X(z, w) \vdash\_{\eta} \{w/y\} \Delta, z:Y; \{w/y\} \Gamma} \text{ [Tvar]}$$

$$\frac{P \vdash\_{\eta} \Delta, x:\{\mu X. A/X\} A; \Gamma}{\text{unfod}\_{\mu}:x; P \vdash\_{\eta} \Delta, x:\mu X. A; \Gamma} \text{ [T\mu]} \quad \frac{P \vdash\_{\eta} \Delta, x:\{\nu X. A/X\} A; \Gamma}{\text{unfod}\_{\nu}:x; P \vdash\_{\eta} \Delta, x:\nu X. A; \Gamma} \text{ [T\nu]}$$

Fig. 2: Typing Rules II: Induction and Coinduction.

$$\frac{P \vdash\_{\eta} a : A, \mathbf{b} : \lor \mathbf{B}, \mathbf{c} : \mathsf{U}\_{\bullet} \mathbf{C}; \varGamma}{\text{affine}\_{\mathsf{b}, \mathsf{c}} \; a; P \vdash\_{\eta} a : \land A, \mathbf{b} : \lor \mathbf{B}, \mathbf{c} : \mathsf{U}\_{\bullet} \mathbf{C}; \varGamma}} \; [\; \text{Taffine}]$$
 
$$\begin{array}{l} \frac{\text{discard } a \vdash\_{\eta} a : \lor A; \varGamma}{\text{discard } a \vdash\_{\eta} a : \lor A; \varGamma} \; \frac{Q \vdash\_{\eta} \Delta, a : A; \varGamma}{\text{usse } a; Q \vdash\_{\eta} \Delta, a : \lor A; \varGamma} \; [\; \text{Tuse}] \end{array}$$
 
$$\text{Fig 3: TuringRule III. Affinitu.}$$

replicated servers and their invocation, ∀/∃ type receive and send of types, implementing polymorphic processes.

Coinductive types are introduced by rule [Tcorec]. It types corecursive processes corec X(z, w); P [x, y], with parameters z, w bound in P, that are instantiated with the arguments x, y (free in the process term). By convention, the coinductive behaviour, of type νY. A, of a corecursive process is always ofered in the frst argument z. According to [Tcorec], to type the body P of a corecursive process, the map η is extended with a coinductive hypothesis binding the process variable X to the typing context ∆, z : Y ; Γ, so that when typing the body P of the corecursion we can appeal to X, which intuitively stands for P itself, and recover its typing invariant. Crucially, the type variable Y is free only in z : A. This causes corecursive calls to be always applied to names z ′ that hereditarily descend from the initial corecursive argument z, a necessary condition for strong normalisation (Theorem 3), and morally corresponds to only allowing corecursive calls on "smaller" argument sessions (of inductive type).

Rule [Tvar] types a corecursive call X(z, w) by looking up in η for the corresponding binding and renaming the parameters with the arguments of the call. Inductive and coinductive types are explicitly unfolded with [Tµ] and [Tν].

To simplify the presentation in program examples, we omit explicit unfolding actions, and write inductive and coinductive type defnitions with equations of the form rec A = f(A) and corec B = f(B) instead of A = µX. f(X) and B = νX. f(X), respectively. Similarly, we write corecursive process defnitions as Q(x, y) = f(Q(−)) instead of Q(x, y) = corec X(z, w); f(X(−)) [x, y], while of course respecting the constraints imposed by typing rules [Tvar] and [Tcorec].

Afnity Afnity is important to model discardable linear resources, and plays an important role in CLASS. An afne session can either be used as a linear session or discarded. The typing rules for the afne modalities are in Fig. 3. Afne sessions are introduced by rule [Tafne] that promotes a linear a : A to an afne session a : ∧A. It types afneb,<sup>c</sup> a; P, which provides an afne session at a and continues as P, and follows the structure of a standard promotion rule.

A session a may be promoted to afne if it only depends on resources that can be disposed, i.e. resources that satisfy some form of weakening capability, namely: coafne sessions b<sup>i</sup> of type ∨B<sup>i</sup> , that can be discarded; full cell usages c<sup>i</sup> of type with U•C<sup>i</sup> , that can be released; and unrestricted sessions in Γ, which are implicitly ?-typed. The dependencies of an afne object on coafne or full

Fig. 3: Typing Rules III: Afnity.

P ⊢<sup>η</sup> ∆, a : ∧A; Γ [Tcell] cell c(a.P) ⊢<sup>η</sup> ∆, c : S•A; Γ [Trelease] release c ⊢<sup>η</sup> c : U•A; Γ [Tempty] empty c ⊢<sup>η</sup> c : S◦A; Γ Q ⊢<sup>η</sup> ∆, a : ∨A, c : U◦A; Γ [Ttake] take c(a); Q ⊢<sup>η</sup> ∆, c : U•A; Γ Q<sup>1</sup> ⊢<sup>η</sup> ∆1, a : ∧A; Γ Q<sup>2</sup> ⊢<sup>η</sup> ∆2, c : U•A; Γ [Tput] put c(a.Q1); Q<sup>2</sup> ⊢<sup>η</sup> ∆1, ∆2, c : U◦A; Γ

Fig. 4: Typing Rules IV: Reference Cells.

cell objects are explicitly annotated as b, c in the process term, to instrument the operational semantics, but we often omit them and simply write afne a; P.

The coafne endpoint ∨A of an afne session, dual of ∧A, has two operations: use and discard. Rule [Tuse] types a process use a; Q that uses a coafne session a and continues as Q, it is a dereliction rule. [Tdiscard] types the process discard a that discards a coafne session a, it is a weakening rule.

Shared Mutable State Shared state is introduced in CLASS by typed constructs that model mutex memory cells, and associated cell operations allowing its use by client code, defned by the tying rules in Fig. 4.

At any moment a cell may be either full or empty, akin to the MVars of Concurrent Haskell [45]. A full cell on c, written cell c(a.P), is typed S•A by rule [Tcell]. Such cell stores an afne session of type ∧A, implemented at a by P. All objects stored in cells are required to be afne, so that memory cells may always be safely disposed without causing memory leaks. An empty cell on c, of type S◦A, and written empty c, is typed by rule [Tempty].

Client processes manipulate cells via take, put and release operations. These operations apply to names of cell usage types - U•A (full cell usage) and U◦A (empty cell usage) - which are dual types of S•A and S◦A, respectively. At any given moment, a client thread owning a U•A-typed usage to a cell may execute a take operation, typed by rule [Ttake]. The take operation take c(a); Q waits to acquire the cell mutex c, and reads its contents into parameter a, the linear (actually coafne, of type ∨A) usage for the object stored in the cell; the cell becomes empty, and execution continues as Q.

It is responsibility of the taking thread to put some value back in the empty cell, thus releasing the lock, causing the cell to transition to the full state. The put operation put c(a.Q1); Q<sup>2</sup> is typed by [Tput], the stored object a, implemented by Q1, is required to be afne, as specifed in the premise a : ∧A.

Hence a cell fips from full to empty and back; [Ttake] uses the cell c at U•A type, and its continuation (in the premise) at U◦A type, symmetrically [Tput] uses the cell c at U◦A type, and its continuation (in the premise) at U•A type.

The release c operation allows a thread to manifestly drop its cell usage c. Release is typed by [Trelease] (cf. coweakening [35]); a usage may only be released

$$\begin{array}{c} P \vdash\_{\eta} \Delta', c: \mathsf{U}\_{\bullet} \mathsf{A}; \varGamma \quad Q \vdash\_{\eta} \Delta, c: \mathsf{U}\_{\bullet} \mathsf{A}; \varGamma \\\hline \text{share } c \; \{P \; \mid \mid Q\} \vdash\_{\eta} \Delta', \Delta, c: \mathsf{U}\_{\bullet} \mathsf{A}; \varGamma \end{array} \text{[Tsh]}$$

$$\begin{array}{c} P \vdash\_{\eta} \Delta', c: \mathsf{U}\_{\circ} \mathsf{A}; \varGamma \quad Q \vdash\_{\eta} \Delta, c: \mathsf{U}\_{\bullet} \mathsf{A}; \varGamma \\\hline \text{share } c \; \{P \; \mid \mid Q\} \vdash\_{\eta} \Delta', \Delta, c: \mathsf{U}\_{\circ} \mathsf{A}; \varGamma \end{array} \text{[TshL]}$$

$$\begin{array}{c} P \vdash\_{\eta} \Delta', c: \mathsf{U}\_{\bullet} \mathsf{A}; \varGamma \quad Q \vdash\_{\eta} \Delta, c: \mathsf{U}\_{\circ} \mathsf{A}; \varGamma \\\hline \text{share } c \; \{P \; \mid \mid Q\} \vdash\_{\eta} \Delta', \Delta, c: \mathsf{U}\_{\circ} \mathsf{A}; \varGamma \end{array} \text{[TshR]}$$

Fig. 5: Typing Rules V: State Sharing.

in the unlocked state U•A. When, for some cell c, all the owning threads release their usages, which eventually happens in well-typed programs, the cell c gets disposed, and its (afne) contents safely discarded.

Our memory cells cells are linear objects, with a linear mutable payload, which are never duplicated by reduction or conversion rules. However, in CLASS, multiple cell usages may be shared between concurrent threads, which compete to take and use it in interleaved critical sections. Such aliased usages be passed around and duplicated dynamically, changing the sharing topology at runtime.

Sharing of cell usages is logically expressed in our system by the typing rules in Fig. 5. Co-contraction, introduced in Diferential Linear Logic DiLL [35], allows fnite multisets of linear resources to safely interact in cut-reduction, resolving concurrent sharing into nondeterminism, as required here to soundly model memory cells and their linear concurrent usages. Rule [Tsh] interprets cocontraction with the construct share c {P || Q}, and types sharing of the cell usage c : U•A between the concurrent threads P and Q.

Contrary to cut, share c {P || Q} is not a binding operator for c. The shared usage c : U•A is free in the conclusion of the typing rule, permitting c to be shared among an arbitrary number of threads, by nested iterated use of [Tsh]. In [Tsh], P and Q only share the single mutex cell c, since the linear context is split multiplicatively, just like [Tcut] wrt. binary sessions. This condition comes from the DiLL typing discipline, and is important to ensure deadlock freedom.

While [Tsh] types sharing of a full (unlocked) cell usage of type U•A, the symmetric rules [TshR] and [TshR] type sharing of an empty (locked) cell usage of type U◦A. We may verify that for every cell c in a well-typed process, at most one unguarded operation to c may be using type U◦A, all the remaining unguarded operations to c must be using type U•A. This implies that, at runtime, only one thread may own the lock for a given (necessarily empty) cell, and execute a put to it, which will bring the cell back to full and release its lock, other threads must be either attempting to take, or release the reference.

Working together, the sharing typing rules ensure that in any well-typed cell sharing tree, at most one single thread at any time may be actively using a cell (in the locked empty state) and put to it, thus guaranteeing mutual exclusion, while satisfying Progress (Theorem 2) which in turn ensures deadlock absence, even in the presence of the crucially blocking behaviour of the take operation.

#### 2.1 Operational Semantics

We now defne CLASS operational semantics, which is given by a structural precongruence relation ≤ that captures static relations on processes, essentially rearranging them, and a reduction relation → that captures process interaction.

Defnition 3 (P ≡ Q and P ≤ Q). Structural congruence ≡ is the least congruence on processes closed under α-conversion and the ≡-rules in Fig. 6. Structural precongruence ≤ is the least precongruence on processes including ≡ and closed under α-conversion and the ≤-rules in Fig. 6.

The basic rules of ≡ essentially refect the expected static laws, along the lines of the structural congruences / conversions in [22,80]. The binary operators forwarder, cut and share are commutative ([comm]). The set of processes modulo ≡ is a commutative monoid with binary operation given by parallel composition and identity given by inaction 0 ([par]). Any two static constructs commute, as expressed by the laws [CM]-[ShC!]. Furthermore, we can distribute the unrestricted cut over all the static constructs as expressed by law [D-C!X], where ∗ stands for either a mix, linear or unrestricted cut or a share.

The commuting conversions [ShTake] and [ShPut] allows take and put operations on cell usages to commute with a share construct. Rule [ShTake] picks the take that occurs on the left argument, however since share is commutative, a right-biased version of [ShTake] is admissible. Using [ShTake], any of the two possible interleavings for two concurrent takes may be nondeterministically picked via ≤. Indeed, we express ≤ as a precongruence because it introduces nondeterminism, and does not express a behavioural equivalence as ≡ does. N.B.: Although one could easily formulate a confuent version of CLASS semantics, using explicit sums as in [13,66,35,65], we prefer in this paper to focus on the expressiveness of CLASS as a programming language and on its deadlock and livelock absence properties, adopting a nondeterministic reduction relation.

In [ShPut] only a put, in the U◦A-typed premise of [TshL], may be propagated up and eventually update the cell, causing it to transit back to the full state. Hence, take operations originating the U•A typed premise of [TshR] will be blocked, waiting until such (unique) put propagation occurs. Algebraically, rule [ShRel] expresses that the release operation is the identity for share composition, we orient it as a precongruence, to ensure type preservation.

#### Defnition 4 (Reduction →). Reduction → is defned by the rules of Fig. 7.

We let <sup>∗</sup>−→ stand for the refexive-transitive closure of →. Reduction includes the set of principal cut conversions, i.e. the redexes for each pair of interacting constructs. It is closed by structural precongruence ([≤]) and in rule [cong] we consider that C is a static context, i.e. a process context in which the hole is covered only by the static constructs mix, cut and share.

fwd x y ≡ fwd y x P |x| Q ≡ Q |x| P share x {P || Q} ≡ share x {Q || P} [comm] P || 0 ≡ P P || Q ≡ Q || P P || (Q || R) ≡ (P || Q) || R [par] P |x| (Q || R) ≡ (P |x| Q) || R [CM] P |x| (Q |y| R) ≡ (P |x| Q) |y| R [CC] P |x| share y {Q || R} ≡ share y {P |x| Q || R} [CSh] P |z| (y.Q |!x| R) ≡ y.Q |!x| (P |z| R) [CC!] y.Q |!x| (P || R) ≡ P || (y.Q |!x| R) [C!M] y.P |!x : A| (w.Q |!z : B| R) ≡ w.Q |!z : B| (y.P |!x : A| R) [C!C!] share x {P || (Q || R)} ≡ share x {P || Q} || R [ShM] share x {P || share y {Q || R}} ≡ share y {share x {P || Q} || R} [ShSh] share z {P || y.Q |!x| R} ≡ y.Q |!x| share z {P || R} [ShC!] y.P |!x : A| (Q ∗ R) ≡ (y.P |!x : A| Q) ∗ (y.P |!x : A| R) [D-C!X] share x {release x || P} ≤ P [ShRel] share x {put x(y.P); Q || R} ≤ put x(y.P);share x {Q || R} [ShPut] share x {take x(y1); P<sup>1</sup> || take x(y2); P2} ≤ take x(y1);share x {P<sup>1</sup> || take x(y2); P2} [ShTake]

Provisos: in [CM] and [ShM], x ∈ fn(Q); in [CC], [CSh] and [ShSh], x, y ∈ fn(Q); in [CC!], [C!M] and [ShC!], x /∈ fn(P); in [C!C!], x /∈ fn(Q) and z /∈ fn(P).

Fig. 6: Structural congruence P ≡ Q and precongruence P ≤ Q.

Operationally, the forwarding behaviour is implemented by name substitution [23] ([fwd]). All the other conversions apply to a principal cut between two dual actions. Reduction rules for the basic session constructs that interpret Second Order Linear Logic and recursion are the expected ones [22,27,81], along predictable lines. For readability, we omit the type declarations in the cuts, as they do not actually play any role in reduction.

We comment the rules concerning afnity. The interaction between an afne session and an use operation is defned by reduction rule [∧∨u], where a cut on a : ∧A between afneb,<sup>c</sup> a; P and use a; Q reduces to a cut on a : A between the continuations P and Q. The reduction between an afne session and a discard operation is defned by [∧∨d]. A cut between afneb,<sup>c</sup> a; P and discard a reduces to a mix-composition of discards (for the coafne sessions b) and releases (for the cell usages c) cf. [6,20]). In the corner case where c and a are empty, the left-hand side of [∧∨d] simply degenerates to inaction 0 (the identity of mix).

The reductions for the mutable state operations are fairly self-explanatory. In rule [S•U•r], a cut between a full mutex cell cell and a release operation reduces to a process that discards the afne cell contents, cf. rule [∧∨d]. In rule [S•U•t], a cut on c : S•A between a full cell and a take operation reduces to a process with

$$\begin{array}{lcll}\text{find} & x \ \boldsymbol{y} \ \boldsymbol{y} \ \boldsymbol{y} \ \boldsymbol{x} \ \boldsymbol{P} \rightarrow \{\boldsymbol{x}\boldsymbol{y}\} \boldsymbol{P} \\ \text{close} & x \ \boldsymbol{x} \ \boldsymbol{\forall} \boldsymbol{x} \ \boldsymbol{x} \ \boldsymbol{P} \rightarrow \boldsymbol{P} \\ & \text{send } x \ \boldsymbol{\langle}\boldsymbol{x}\ \boldsymbol{P}\rangle; \boldsymbol{Q} \ \boldsymbol{x} \ \boldsymbol{\forall} \ \boldsymbol{x} \ \boldsymbol{Q} \ \boldsymbol{x} \ \boldsymbol{x}; \boldsymbol{\forall} \boldsymbol{P} \ \boldsymbol{x} \ \boldsymbol{Q} \ \boldsymbol{x} \ \boldsymbol{P} \ \boldsymbol{Q} \ \boldsymbol{x} \end{array} & \begin{bmatrix} \text{fw} \boldsymbol{d} \\ \text{1} \ 1 \end{bmatrix} \\ \text{send } x \ \boldsymbol{\forall} \boldsymbol{x} \ \boldsymbol{P} \ \boldsymbol{A} \ \boldsymbol{x} \ \boldsymbol{Q} \ \boldsymbol{x} \ \boldsymbol{x} \ \boldsymbol{\forall} \boldsymbol{x}; \boldsymbol{\Pi} \boldsymbol{R} \ \boldsymbol{P} \times \boldsymbol{P} \ \boldsymbol{x} \ \boldsymbol{R} \\ & \text{case } x \ \Big[ \begin{bmatrix} \text{if} \boldsymbol{x}\boldsymbol{P} \end{bmatrix}; \boldsymbol{P} \ \boldsymbol{x}; \boldsymbol{\forall} \boldsymbol{P} \ \boldsymbol{x} \ \boldsymbol{P} \ \boldsymbol{x} \ \boldsymbol{P} \ \boldsymbol{Q} \ \boldsymbol{x} \ \boldsymbol{P} \ \end{bmatrix} \begin{bmatrix} \begin{bmatrix} \text{??} \end{bmatrix} \\ \text{2} \end{bmatrix} \\ \text{send } x \ \boldsymbol{\forall} \boldsymbol{x}(\boldsymbol{Q}); \boldsymbol{P} \ \boldsymbol{x} \ \boldsymbol{Q} \ \boldsymbol{x} \ \boldsymbol{Q} \ \boldsymbol{x} \ \boldsymbol{Q} \ \boldsymbol{x} \ \$$

Fig. 7: Reduction P → Q.

two cuts, both composed with the continuation {a/a′}Q of the take. The outer cut on a : ∧A composes with the stored afne session, which was successfully acquired by the take operation. The inner cut on c : S◦A composes with the reference cell c, which has became empty in the reductum. Finally, in rule [S◦U◦], a cut on session c : S◦A between an empty cell and a put operation reduces to a cut on session c : S•A between a full cell, that now stores the session that was put, and the continuation of the put process. Notice that the locking/unlocking behaviour of cells is simply modelled by rewriting of the process terms, from cell to empty and back, as typical in process calculi.

# 3 Type Safety and Strong Normalisation

In this section we state and give proof sketches for our main results of type safety and strong normalisation. Full proofs may be found in [65].

Type Preservation The semantics of CLASS is defned by a set of precongruence ≤ and reduction → rules on process terms. Theorem 1 shows that these relations preserve typing, and gives substance to our PaT approach, showing that every ≤ and → rule corresponds to a conversion on type derivations/proofs.

Theorem 1 (Type Preservation). Suppose P ⊢<sup>η</sup> ∆; Γ. (1) If P ≤ Q, then Q ⊢<sup>η</sup> ∆; Γ. (2) If P → Q, then Q ⊢<sup>η</sup> ∆; Γ.

Proof. By induction on derivations for P ≤ Q (resp. P → Q), we verify that all the rules of ≤ (Def. 3) (resp. → (Def. 4)) are type preserving.

Progress We prove the progress property for well-typed CLASS processes. The following notion of live process becomes useful. A process P is live if and only if P = C[Q], for some static context C (the hole lies within the scope of static constructs mix, cut and share) and Q is an active process (a process with a topmost action prefx, such as a receive or a take, or a forwarder). We frst show that a live well-typed process either reduces or ofers an interaction with its environment on a free name. The following observability predicate (cf. [70]) characterises the interactions of a process with its environment

# Defnition 5 (P ↓x). The predicate P ↓<sup>x</sup> is defned by rules of Fig. 8.

The predicate P ↓<sup>x</sup> holds if P ofers an immediate interaction (unguarded action) on free name x. We can observe the subject of an action (rule [act]) and x, y of a forwarder fwd x y. The defnition of P ↓<sup>x</sup> is closed by ≤ and propagates observations over the various static operators. Cut bound names are not free, hence cannot be observed. Share share y {P || Q} propagates all the observations x for which x ̸= y and by applying ≤ rules [ShTake], [ShRel] or [ShPut] via [≤], an interaction on x may be observed. We have

# Lemma 1 (Liveness). Let P ⊢<sup>∅</sup> ∆; Γ be live. Either P ↓<sup>x</sup> or P reduces.

Proof. (Sketch) By induction on a derivation for P ⊢<sup>∅</sup> ∆; Γ, along the lines of [27]. To handle case [Tcut] P = P<sup>1</sup> |y| P2: both P<sup>1</sup> and P<sup>2</sup> are live, since both type with a nonempty linear typing context, hence we can apply the induction hypothesis (i.h.) to both premises of [Tcut]: either (i) one of P<sup>1</sup> and P<sup>2</sup> reduces or (ii) both P<sup>1</sup> ↓x<sup>1</sup> and P<sup>2</sup> ↓x<sup>2</sup> . If (i), then P reduces. Case (ii) follows because, crucially, P<sup>1</sup> and P<sup>2</sup> synchronise through a single private session y, then either x<sup>1</sup> ̸= y or x<sup>2</sup> ̸= y, in which case we can observe either x<sup>1</sup> or x2; or x<sup>1</sup> = x<sup>2</sup> = y, in which case we can trigger a reduction, by applying ≤ rules to P in order to exhibit a principal cut. For case [Tsh] P = share y {P<sup>1</sup> || P2}: since P<sup>1</sup> and P<sup>2</sup> are live, we apply i.h. to both premises. The interesting case occurs when P<sup>1</sup> ↓<sup>x</sup><sup>1</sup> and P<sup>2</sup> ↓<sup>x</sup><sup>2</sup> . Co-contraction implies that P<sup>1</sup> and P<sup>2</sup> share the single usage y, so if x<sup>1</sup> ̸= y or x<sup>2</sup> ̸= y, we have either P<sup>1</sup> ↓<sup>x</sup><sup>1</sup> or P<sup>1</sup> ↓<sup>x</sup><sup>2</sup> . If both x<sup>1</sup> = x<sup>2</sup> = y, then we derive P ↓y: the observation corresponds to either a take or a release operation on y, which we commute up with [ShTake] or [ShRel]. For [TshL] P = share y {P<sup>1</sup> || P2}, we apply the i.h. to the premise P1, which types with an empty usage on y. If P<sup>1</sup> ↓y, then P ↓y, the observation corresponding a put operation on y, which we commute up with [ShPut]. Symmetrically for [TshR].

Theorem 2 (Progress). Let P ⊢<sup>∅</sup> ∅; ∅ be a live process. Then, P reduces.

Proof. Follows from Lemma 1 since fn(P) = ∅.

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \text{fwd } x \ y \downarrow\_{x} \end{array} \end{array} \begin{array}{c} \begin{array}{c} s(\mathcal{A}) = x \\ \hline \end{array} \end{array} \begin{array}{c} \begin{array}{c} P \leq Q \\ P \downarrow\_{x} \end{array} \end{array} \begin{array}{c} P \leq Q \\ P \downarrow\_{x} \end{array} \begin{array}{c} \begin{array}{c} P \downarrow\_{x} \\ P \end{array} \end{array} \begin{array}{c} \begin{array}{c} P \downarrow\_{x} \\ (P \parallel Q) \downarrow\_{x} \end{array} \end{array} \begin{array}{c} \begin{array}{c} P \downarrow\_{x} \\ (P \parallel Q) \downarrow\_{x} \end{array} \end{array} \begin{array}{c} \begin{array}{c} P \downarrow\_{x} \\ (P \parallel Q) \downarrow\_{x} \end{array} \end{array} \begin{array}{c} \begin{array}{c} P \downarrow\_{x} \\ (P \parallel Q) \downarrow\_{x} \end{array} \end{array} \begin{array}{c} \begin{array}{c} P \downarrow\_{x} \\ (P \parallel Q) \downarrow\_{x} \end{array} \end{array} \begin{array}{c} \begin{array}{c} P \downarrow\_{x} \\ (P \parallel Q) \downarrow\_{x} \end{array} \end{array} \begin{array}{c} \begin{array}{c} P \downarrow\_{x} \\ (P \parallel Q) \downarrow\_{x} \end{array} \right.$$

Fig. 8: Observability Predicate P ↓x.

Remarkably, our proof of Theorem 2 leverages deep properties of Linear Logic, in particular the structure of the linear cut and co-contraction, allowing us to prove deadlock absence, even in a language with primitives exhibiting blocking behaviour, avoiding the use of extra mechanisms [47,33,48,10,25,76,31].

Strong Normalisation Establishing strong normalisation (SN) for concurrent process calculi is usually fairly challenging, particularly in the presence of name passing, recursion and higher-order shared state [32,16,83,49,69]. For example, with reference cells one may express general recursion with Landin's knot, and, in general, circular chains of references that may lead to divergence. However, our linear type system uses primitive recursion and corecursion, and excludes cyclic dependencies through state or session based interaction, allowing strong normalisation, and therefore livelock absence, to hold. Our proof relies on defning suitable linear logical relations, cf. [62,21,72], adapted to Classical Linear Logic [38,1,8], and crucially relying on a notion of reducibility up to interference that imposes stronger properties on the interpretation of the state modalities, and which allows the inductive proof of the Fundamental Lemma 2 to go through in the usual way. To this end, we extend our basic language with auxiliary constructs cell c(a.S) and empty c(a.S), which denote memory cells subject to interference from concurrent writers, allowed to take terms from the set S ⊆ {P | P ⊢<sup>η</sup> a : ∧A}. The intuition is that a take on the cell may always read any object from S, due to interference. We also consider the additional reduction (nondeterministic) rules (1)-(3), where in 1 and 2 we assume P ∈ S.

$$\begin{array}{ccccc} \text{cell } c(a.S) & |c| \text{ release } c & \rightarrow P \ |a| \text{ discard } a, & \text{(1)}\\ \text{\$c@ } c(a.S) & |c| \text{ } \vdash \text{/a } c(a') \cdot O & \rightarrow \text{ } \text{amount} \ \text{ } c(a.S) \ |c| \text{ } (P \ |a| \text{ } f\_a \ |a' \ |O\rangle \ ) \end{array} \tag{1}$$

$$\begin{array}{llll} \text{cell } c(a.S) \ |c| \text{ tàk } c(a'); Q & \rightarrow \text{empty } c(a.S) \ |c| \text{ } (P \ |a| \ \{a/a'\} Q) & \begin{array}{ll} \text{(2)} \\ \text{empty } c(a.S) \ |c| \text{ put } c(a.P); Q & \rightarrow \text{cell } c(a.S) \ |c| \text{ } Q \end{array} & \begin{array}{llll} \text{(3)} \end{array} \end{array}$$

In this section, we thus consider reduction of P → Q to be the relation defned in Fig 7, extended with these rules. When a take or a release interacts with cell c(a.S), an arbitrary element P from the set S may be picked (rules (1) and (2)). In (3), a put put c(a.P); Q interacts with empty c(a.S) causing empty c(a.S) to evolve to cell c(a.S) (3). The following notion is also useful. A process P is

– if P <sup>∗</sup>−→≈ take x(y); P ′ and Q ∈ S, then Q |y| P ′ is S-preserving on x.

S-preserving on x if P ⊢<sup>η</sup> x : U•A or P ⊢<sup>η</sup> x : U◦A, and

– if P <sup>∗</sup>−→≈ put x(y.P1); P2, then P<sup>1</sup> ∈ S and P<sup>2</sup> is S-preserving on x. A set of processes T is S-preserving on x if and only for all P ∈ T, P is Spreserving on x. Intuitively a process P that uses a cell x is S-preserving on x if it only puts values from S on cell x. The notion of S-preservation, parametric on any S, brings explicit the conditions needed for safe interaction with a memory cell, subject to interference, while ensuring a state invariant S on the cell contents. We now introduce the logical predicate.

Defnition 6 (Logical Predicate <sup>J</sup><sup>x</sup> : <sup>A</sup>K<sup>σ</sup>). By induction on the type <sup>A</sup>, we defne the sets <sup>J</sup><sup>x</sup> : <sup>A</sup>K<sup>σ</sup> an shown in Fig. 9, such that <sup>J</sup><sup>x</sup> : <sup>U</sup>•AK<sup>σ</sup> and <sup>J</sup><sup>x</sup> : <sup>U</sup>◦AK<sup>σ</sup> are <sup>J</sup><sup>−</sup> : <sup>∧</sup>AK-preserving on <sup>x</sup>.The defnition is direct for the positive types <sup>A</sup>, for negative types B is given by orthogonality.

The defnition relies on Girard's notion of orthogonality S <sup>⊥</sup> ≜ {P | ∀Q ∈ S. P |x| Q is SN} [37]. Duality promotes succinctness in our defnition: for negative types <sup>A</sup>, <sup>J</sup><sup>x</sup> : <sup>A</sup>K<sup>σ</sup> is defned as the orthogonal of the predicate for its dual A (positive) type. To handle polymorphic and inductive types, the logical predicate is indexed by a map σ that assigns reducibility candidates R[x : A] to type variables. A reducibility candidate R[x : A] is any set S of processes P ⊢<sup>∅</sup> x : A such that P is SN and S = S ⊥⊥. We let R[− : A] be the set of all reducibility candidates R[x : A] for some name x. The defnition relies on a congruence relation ≈ extending ≤ with a complete set of commuting conversions, along standard lines [22,27,80]. It essentially plays the role of the labelled transition system in the proof of strong normalisation given in [62].

We extend the logical predicate to typing judgements P ⊢<sup>η</sup> ∆; Γ by universal closure over the typing context and σ.

Defnition 7 (Extended Logical Predicate <sup>L</sup>J⊢<sup>η</sup> <sup>∆</sup>; <sup>Γ</sup>Kσ). We defne <sup>L</sup>J⊢<sup>η</sup> <sup>∆</sup>; <sup>Γ</sup>K<sup>σ</sup> inductively on ∆, Γ and <sup>η</sup> as the set of processes <sup>P</sup> <sup>⊢</sup><sup>η</sup> <sup>∆</sup>; <sup>Γ</sup> s.t.

<sup>P</sup> ∈ LJ⊢<sup>∅</sup> <sup>∅</sup>; <sup>∅</sup>K<sup>σ</sup> if <sup>P</sup> is SN. <sup>P</sup> ∈ LJ⊢<sup>∅</sup> ∆, x : <sup>A</sup>; <sup>Γ</sup>K<sup>σ</sup> if <sup>∀</sup><sup>Q</sup> <sup>∈</sup> <sup>J</sup><sup>x</sup> : <sup>A</sup>K<sup>σ</sup>. Q <sup>|</sup><sup>x</sup> : <sup>A</sup><sup>|</sup> <sup>P</sup> ∈ LJ⊢<sup>∅</sup> <sup>∆</sup>; <sup>Γ</sup>K<sup>σ</sup>. <sup>P</sup> ∈ LJ⊢<sup>∅</sup> <sup>∆</sup>; Γ, x : <sup>A</sup>K<sup>σ</sup> if <sup>∀</sup><sup>Q</sup> <sup>∈</sup> <sup>J</sup><sup>y</sup> : <sup>A</sup>K<sup>σ</sup>. y.Q <sup>|</sup>!<sup>x</sup> : <sup>A</sup><sup>|</sup> <sup>P</sup> ∈ LJ⊢<sup>∅</sup> <sup>∆</sup>; <sup>Γ</sup>K<sup>σ</sup>. <sup>P</sup> ∈ LJ⊢η,X(x,y)7→∆′ ,x:<sup>Y</sup> ;<sup>Γ</sup> <sup>∆</sup>; <sup>Γ</sup>K<sup>σ</sup> if <sup>∀</sup><sup>Q</sup> <sup>∈</sup> <sup>σ</sup>(<sup>Y</sup> ). {Q/X}<sup>P</sup> ∈ LJ⊢<sup>η</sup> <sup>∆</sup>; <sup>Γ</sup>K<sup>σ</sup>.

We now state the Fundamental Lemma (2) from which Theorem 3 follows.

# Lemma 2 (Fundamental Lemma). If <sup>P</sup> <sup>⊢</sup><sup>η</sup> <sup>∆</sup>; <sup>Γ</sup>, then <sup>P</sup> ∈ LJ⊢<sup>η</sup> <sup>∆</sup>; <sup>Γ</sup>K<sup>σ</sup>.

Proof. (Sketch) By induction on P ⊢<sup>η</sup> ∆; Γ. For cases [Tcell] and [Tempty], we show that cell c(a.S) and empty c(a.S) respectively simulate cell c(a.P) (where P ∈ S) and empty c, when composed with any S-preserving on c usages. To handle one of the most challenging cases, [Tsh] we prove, for all S, and all Spreserving on x processes P<sup>1</sup> and P2, that cell c(a.S) |c| share c {P<sup>1</sup> || P2} (1) is simulated by (cell c(a.S) |c| P1) || (cell c(a.S) |c| P2) (2). This allows us to infer that if (2) is SN, then so it is (1). When <sup>S</sup> <sup>=</sup> <sup>J</sup><sup>a</sup> : <sup>∧</sup>AK<sup>σ</sup>, the i.h. yields (cell c(a.S) |c| Pi) SN, hence we conclude (2) SN. Similarly for [TshL], [TshR].

Theorem 3 (Strong Normalisation). If P ⊢<sup>∅</sup> ∅; ∅, then P is SN.

$$\begin{array}{llll} [x:X]\_{\sigma} & \stackrel{\scriptstyle\in}{\Rightarrow} \sigma(X)[x] \\ [x:X:A\otimes B]\_{\sigma} & \stackrel{\scriptstyle\in}{\Rightarrow} \{P\mid P\approx\mathsf{close}\,x\text{ and }P\text{ is SN}\}^{\bot\bot} \\ [x:A\otimes B]\_{\sigma} & \stackrel{\scriptstyle\in}{\Rightarrow} \{P\mid \exists P\_{1},P\_{2}\text{ }P\approx\mathsf{send}\,x(y.P\_{1});P\_{2}\text{ and}\\ & & P\_{1}\in\{y:A\}\_{\sigma}\text{ and }P\_{2}\in\{x:B\}\_{\sigma}\}^{\bot\bot} \\ [x:A\oplus B]\_{\sigma} & \stackrel{\scriptstyle\in}{\Rightarrow} \{P\mid \exists Q.\,P\approx x.\mathsf{in}[i:Q\text{ and}\,\,Q\in\{x:A\}\_{\sigma}\}\text{ or}\\ & & P\approx x.\mathsf{in};q\text{ and}\,Q\in\{x:A\}\_{\sigma}\}^{\bot\bot} \\ [x:A!A]\_{\sigma} & \stackrel{\scriptstyle\in}{\Rightarrow} \{P\mid \exists Q.\,P\approx x(y);Q\text{ and}\,Q\in\{y:A\}\_{\sigma}\}^{\bot\bot} \\ [x:\exists X.A]\_{\sigma} & \stackrel{\scriptstyle\in}{\Rightarrow} \{P\mid \exists Q,S\in\mathcal{R}[-:\,B].\,P\approx\mathsf{send}\,x(B);Q\text{ and} \\ [x:\exists X.A]\_{\sigma} & \stackrel{\scriptstyle\in}{\Rightarrow} \{[\,\forall x\,B\!\_{\leftarrow}\,:\,\mu X.A\!\_{\leftarrow}\}\text{ und}\,\mu\![\!\!\!\!u\_{\left[x\rightarrow A\!\!\!\!\!\!\!\!\!\/$$

Fig. 9: Logical Predicate <sup>J</sup><sup>x</sup> : <sup>A</sup>Kσ.

# 4 Typeful Concurrent Programming in CLASS

In this section, we discuss the expressiveness of CLASS's type system, going through a sequence of illustrative realistic concurrent programming idioms.

Sharing a Linear Session. Our frst example illustrates how objects subject to a linear usage protocol and satisfying an invariant may be shared among multiple concurrent clients by serialising linear usages using a mutex cell, alternating ownership from the cell to clients and back at the invariant state, a commonly used discipline to implement and reason about resource sharing (see, e.g., [39,17,9]). We illustrate with a basic toggle switch with two states - On and Of - the resource invariant is the state Of, and two operations #turnOn and #turnOf that must be executed in strict linear sequence (Fig. 10). The toggle protocol, defned by type Of, ofers the single option #turnOn, after which it evolves to On. Conversely, type On ofers the single option #turnOf, after which it evolves to an afne Of. The toggle process at t is defned by two mutually corecursive processes on(t) and of(t), which defne the expected behaviour, and comply with types On and Of.

Process main() introduces a mutex cell c storing an afne toggle object at the invariant type ∧Of. It then shares it with two concurrent clients, each acquires the toggle in the invariant type and uses the linear protocol independently. After their linear interaction, they put back the toggle, the type system ensures that this can only happen when the invariant (given by the cell type) holds. When they are done, both clients release their respective usages of c, which ultimately leads to the cell being deallocated and the (afne) toggle to be discarded.

type corec Of <sup>=</sup> <sup>N</sup>{|#turnOn : On} type corec On <sup>=</sup> <sup>N</sup>{|#turnOf : <sup>∧</sup>Of} of(t) ⊢ t : Of of(t) = case t {|#turnOn : on(t)} on(t) ⊢ t : On on(t) = case t {|#turnOf : afne t; of(t)} client1(c) ⊢ c : S•Of client1(c) = take c(t); #turnOn t; #turnOf t; put c(t);release c client2(c) ⊢ c : S•Of client2(c) = take c(t); #turnOn t; #turnOf t; #turnOn t; #turnOf t; put c(t);release c main() ⊢ ∅ main() = cut {cell c(t.afne t; of(t)) |c| share c { client1(c) || client2(c) }}

Fig. 10: Sharing a Linear Toggle Switch

type rec SList(A) = S•List(A) type rec List(A) = ⊕{ |#Null : 1, |#Next : ∧A ⊗ SList(A)} nil(l) ⊢ l : ∧List(A) nil(l) = afne l; #Null l; close l cnext(a, c, l) ⊢ a: ∨ A, c:SList(A), l: ∧ List(A) cnext(a, c, l) = afne l; #Next l; send l(a); fwd l c append(c, l′ , c′ ) = take c(l); case l { |#Null : wait l; put c(l ′ ); fwd c c′ |#Next : recv l(a); cut { append(l, l′ , x) |x| put c(y.cnext(a, x, y)); fwd c c′ }}

Fig. 11: A Linked List with an Append In-Place Operation.

We have also developed CLASS code for a generic (polymorphic) wrapper factory that, for any afne corecursive protocol, generates a wrapper to a general invariant-based sharing interface.

Linked Lists, Update In-Place. In this example, we show how inductive/ coinductive types combine harmoniously with CLASS state modalities to type linked data structures with memory-efcient updates in-place. Specifcally, we show how to code a linked list, parametric on the type A of its afne values, with update in-place append (Fig. 11). An object of type SList(A) is a (full) cell storing a List(A) object. An object of type List(A) is a session that either selects #Null (the list is empty), in which case it closes; or selects #Next, in which case it sends an afne session ∧A representing the head element and continues as the tail SList(A). Process nil(l) - defnes an empty list at l - and process cnext(a, c, l) - constructs a nonempty list l with head a and tail c. For example, a list with elements a, b stored at c<sup>1</sup> : S•List(A) is represented

# cut{ cell c1(l1.cnext(a, c2, l1)) |c2| cell c2(l2.cnext(b, cs, l2)) |cs| cell cs(l0.nil(l0))}

Process append(c, l′ , c′ ) ⊢ c : SList(A), l′ : List(A), c′ : SList(A) produces on c ′ the result of appending l (in place) to c. It takes the list l stored in c, and then performs case analysis on l. If l selects #Null, it simply replaces the previous null node of c by l ′ and forwards the updated cell c to the output c ′ . This corresponds to the recursion base case in which the list l is empty.

If l selects #Next, in which case l has at least one element, one receives at l the node element a : ∨A, and corecursively call append l ′ to the tail l : SList(A) and puts back in c element a and tail x "returned" by the call. Notice that x is exactly x (by forwarding), which was passed along linearly. Remarkably, the append(c, l′ , c′ ) operation just defned may be safely applied concurrently to the same shared linked list, with the fnal result being the correct one (some serialisation of the appends), without deadlocks or livelocks. It is also interesting to see how the type system forbids a list to be appended to itself.

We have also developed many other in-place operations on linked data structures, such as insertion sort, and other kinds of linked structures such as queues and binary search trees. In the next examples we discuss a shared queue ADT with a fne-grained locking discipline and O(1) enqueue and dequeue operations.

A Concurrent Shareable Bufered Channel. We illustrate increased degrees of sharing in a mutable data structure with various references pointing to diferent parts of it, how the CLASS type system may express interfaces that talk about diferent client views for using a stateful object, and the use of polymorphism to implement information hiding ensuring that client code will never break the representation invariants of stateful ADTs, particularly challenging when aliasing and sharing are involved.

More concretely, we consider a shareable bufered channel (Fig. 12), and provide a realistic and efcient implementation [56] based on a message queue represented by a linked list with update-in-place (cf. Section 4 above) and two independent pointers: one to the head of the list, used for receiving, and another to the tail, used for sending. The operations are executed in O(1) time. Moreover we provide a typing with two separate send and receive views, which may be used by an arbitrary number of concurrent clients. In particular, when the list is nonempty, both send and receive run in true concurrency (asynchronously), without blocking each other, thanks to fne-grained locking.

The bufered channel type BChan(M), where M is the type of messages, ofers two views: SendT(M) and RecvT(M), interfaces for sender and receiver endpoint clients. These views are exposed with a par (O), since they share an underlying resourceful structure. In fact, they could not be exported using a tensor (⊗); it is interesting to notice how the type system imposes these constraints, important to ensure deadlock freedom. The representation type of both views is Rep = S•SList(M) (see Section 4), hidden behind the SV and RV existential types [29,58]; sending clients use a cell storing a reference to the tail node of

type BChan(M) = SendT(M) <sup>O</sup> RecvT(M) type SendT(M) = ∃SV.!MenuS(M, SV ) ⊗ SV type RecvT(M) = ∃RV.!MenuR(M, RV ) ⊗ RV type MenuS(M, SV ) = <sup>N</sup> { |#Send : SV ⊸ ∧M ⊸ SV, <sup>|</sup>#Share : SV <sup>⊸</sup> (SV <sup>O</sup> SV ), |#Free : SV ⊸ 1 }, type MenuR(M, RV ) = <sup>N</sup> { |#Recv : RV ⊸ (Maybe(∧M) ⊗ RV ), <sup>|</sup>#Share : RV <sup>⊸</sup> (RV <sup>O</sup> RV ), |#Free : RV ⊸ 1 } Rep = SV = RV = S•SList(M) msend(me) = recv me(tailptr ); recv me(a); take tailptr (c); take c(l); cut { cell c ′ (l) |c ′ | share c ′ { put c(l ′ .cnext(a, c′ , l′ )); release c ′ || put tailptr (c ′ ); send me(tailptr ); close me}}

Fig. 12: A Concurrent Shareable Bufered Channel.

the queue; receiving clients use a cell storing a reference to the head node of the queue.

Clients use the bufer through references of abstract type SV and RV and replicated menus !MenuS(M, SV ) and !MenuR(M, RV ). Both menus export the options #Share and #Free to allow sharing and release of the views. To send, a client selects #Send, sends his handle (of opaque type SV ), the message to send and receives the (linear) handle back. In this implementation, receive is nonblocking, so operation #Recv returns a Maybe(∧M) value: the client receives either #Nothing (if the bufer is empty) or #Just followed by a message a, otherwise. In 4 we discuss the implementation, in CLASS, of (Hoare style) monitors with conditions, which would allow a blocking receive to be implemented.

Process msend(me) implements the #Send "method". It frst receives the sending view handle (of concrete type Rep), which is a cell with the tailptr , and the message a to be sent. Then, a new cell c ′ with nil (l) is created, the current tail of the list c is updated with a new node storing a and pointing to c ′ . Finally, the tailptr cell is updated to point to the new tail node c ′ of the linked list.

Dining Philosophers. A resource hierarchy solution for the dining philosophers problem [34] requires forks to be acquired in a defned order. We "encode" such order in CLASS with an explicit (necessarily) acyclic structure, which informs the type system about the code safety. This allows us to defne a correct implementation that satisfes deadlock freedom by pure linear logic typing. More concretely, we organise the forks in a linked chain defned by the inductive types rec Fork = S•Node and rec Node = ⊕{#Null : 1, #Next : Fork}.

Any fork in the chain may be shared by an arbitrary number of philosophers, cocontraction ensures that philosophers cannot communicate between themselves via any other channel, all synchronisation must happen via the chained


Fig. 13: The Dining Philosophers.

forks. Furthermore, the chain can be resized and grow unboundedly to accommodate an arbitrary number of philosophers. If a philosopher successfully takes a fork f<sup>i</sup> , he can then take any fork f<sup>j</sup> , with i < j; crucially, he must follow the path dictated by the chain, hence cannot acquire forks f<sup>j</sup> with j < i. In Fig. 13 we defne the eat operation, which allows each philosopher P<sup>i</sup> , with 0 ≤ i < k −1 to eat: it acquires two consecutive forks in the chain. And eat2, which is the specifc eating operation for the symmetry breaker Pk−1: it acquires the frst fork, and traverses the chain to acquire the last with takeLast(n, x) ⊢ n : Fork, x : Fork⊗1.

A Barrier for N threads. We describe in Fig. 14 a CLASS implementation of a simple barrier, parametric on the number N of threads to synchronise. We fnd it interesting to model the "real" code shown in the Rust reference page for std::sync::Mutex [46]. The code uses if-then-else and primitive integers, as ofered in our implementation, that could be defned as idioms in CLASS. We represent a barrier by a mutex cell storing a pair consisting of an integer n, holding the number of threads that have not yet reached the barrier, and a stack s of waiting threads, each represented by a session of afne type ∧⊥ (so they will be safely aborted if at least one thread fails to reach the barrier).

The type Barrier of the barrier is S•BState, where BState ≜ Int ⊗ ∧List(∧⊥). Initially the barrier is initialised with n = N threads and an empty stack, so that the invariant n+depth(s) = N holds during execution. Each thread(c;i) acquires the barrier c and checks if it is the last thread to reach the barrier (if n == 1): in this case, it awakes all the waiting threads (awakeAll(ws)) and resets the barrier. Otherwise, it updates the barrier by decrementing n and pushing its continuation into the stack (the continuation for thread i just prints "fnished"). The following process main() ⊢ ∅ creates a new barrier c and spawns N threads, each labelled

```
init(ws) ⊢ ws : ∧BState
init(ws) ≜
  afne ws;send ws(N); afne ws; nil(ws)
awakeAll(ws : List(∧⊥))
awakeAll(ws) ≜
  case ws {
     #Nil : wait ws; 0
     #Cons :
     recv ws(w);
     par {close w || awakeAll(ws)}
spawnAll(c;i, n) ⊢ c : Barrier;i : Int, n : Int
spawnAll(c;i, n) ≜
  if (n == 0) { release c}
  { share c {
       thread(c;i)
       ||
       spawnall(c;i + 1, n − 1)}}
                                                  thread(c;i) ⊢ c : Barrier;i : Int
                                                  thread(c;i) =
                                                     println i + ": waiting.";
                                                     take c(ws);recv ws(n);
                                                     if (n == 1) {
                                                       par {
                                                          println i + ": fnished.";
                                                          awakeAll(ws)
                                                          ||
                                                          put c(w
                                                                  ′
                                                                  s
                                                                   .init(w
                                                                          ′
                                                                          s
                                                                           ));
                                                          release c}}
                                                     { cut {
                                                          afne w;wait w;
                                                          println i + ": fnished."; 0
                                                          |w| put c(w
                                                                      ′
                                                                      s
                                                                       .afne w
                                                                                 ′
                                                                                 s
                                                                                  ;
                                                                    send w
                                                                            ′
                                                                            s
                                                                             (n − 1);
                                                                    afne w
                                                                             ′
                                                                             s
                                                                              ;
                                                                    cons(w, ws, w′
                                                                                   s
                                                                                    ));
                                                          release c}}
```
Fig. 14: A Barrier for N Threads

by a unique id i: main() ≜ cut { cell c(ws.init(ws)) |c| spawnAll(c; 0, N) }. Again, our type system statically ensures that the code does not deadlock or livelock.

A Hoare Style Monitor. A Hoare style monitor is a well-know powerful programming abstraction [39], allowing concurrent operations on shared data to be coordinated in a sound way, so that it always satisfy a correctness invariant. The key essential idea is that concurrent client threads use the monitor lock to access the protected state in mutual exclusion, but may also wait (via a await primitive) inside the monitor until the state satisfes specifc (pre-)conditions, while transferring state ownership to other threads potentially responsible for establishing such conditions and announcing it (via a notify primitive).

We discuss a CLASS implementation of a monitor, sketching the main components and how they are typed (Fig. 15). We consider a counter with value n, with increment #Inc and decrement #Dec operations, and subject to the invariant n ≥ 0. The type of the counter CounterI exposes two separate, coinductively defned, client interfaces DecI and IncI for decrementing and incrementing.

While the #Inc operation is synchronous, the #Dec operation is always called asynchronously by passing a continuation (of type ContDec). This allows decrementers to wait inside the monitor for condition NZ (n > 0) when n = 0. The condition NZ is represented by a wait queue of type WaitQ. The representation type of the monitor (Rep) holds the counter value and the wait queue. Each node in the wait queue stores information, of type ContDecW, for the waiting thread. type corec IncI <sup>≜</sup> <sup>N</sup>{|#Inc : IncI, <sup>|</sup>#End : ⊥} type corec DecI ≜ <sup>∨</sup> <sup>N</sup> {|#Dec : <sup>∨</sup>(ContDec <sup>⊸</sup> <sup>⊥</sup>), #End : ⊥} type corec ContDec ≜ ∨(DecI ⊗ 1) type CounterI <sup>≜</sup> DecI <sup>O</sup> IncI type rec Rep ≜ (!Int) ⊗ WaitQ type rec WaitQ ≜ ∧ ⊕ {|#Null : 1, |#Next : NodeQ} type rec NodeQ ≜ S•(ContDecW ⊗ WaitQ) type rec ContDecW ≜ ∧(∧Rep ⊸ ∧Rep ⊗ DecI ⊸ ⊥) awaitNZ ⊢ m : U◦Rep, n : !Int, w : WaitQ, cc : ContDecW notifyNZ ⊢ m : U◦Rep, s : Rep, m′ : S•Rep incloop ⊢ iv : IncI, m : U•Rep awaitNZ(m, n, w, cc) ≜ put m(w ′ .afne v; send w ′ (n); consWQ(cc, w, w′ )); release m incloop(iv, m) ≜ case iv { #Inc : take m(r); recv r(n); cut { send s(n + 1); fwd s r |s| notifyNZ(m, s, m′ ) |m′ | incloop(iv, m′ ) } #End : wait iv; release m}

Fig. 15: Implementing a Counter Monitor with Await / Notify.

Every such ContDecW objects stores (1) the pending action on the internal monitor state (of type ∧Rep ⊸ ∧Rep), to be executed after await returns, and (2) a callback to the continuation provided by the external client in the asynchronous call (of type DecI ⊸ ⊥).

The awaitNZ(m, n, w, cc) process implements the monitor wait operation, used in the #Dec operation. It receives the (empty) cell usage m to the monitor state, the integer value n (where n = 0), a reference w to the wait queue, and the continuation cc, it pushes a new node in the queue and puts the monitor state back, unlocking the cell m, and releases m. The incloop(iv, m) process implements the counter IncI interface. The call to notifyNZ(m, s, m′ ) after incrementing n will cause a waiting DecI thread to be awaken (if any), and continue by applying the pending action to the Rep state s in which n > 0 holds, before passing the updated state m′ to the incloop recursive call. Afnity plays a key role, allowing all data structures, including waiting continuations to be safely discarded, at the end of any computation. We have only shown here some code snippets, the complete code is available in the CLASS distribution.

Our examples illustrate how our system types non-trivial concurrent code, akin to real system-level code, involving higher-order state, rich sharing and ownership transfer patterns, while ensuring deadlock, livelock freedom and memory safety. Our typing of sharing imposes that only a single bundle of linear resources may be shared by two independent threads. As our examples show, code can often be structured in that way, so that bundles of many linear resources may be safely shared by monitor-like structures, exposing informative typed interfaces.

The feasibility of CLASS is corroborated by our implementation [68] of a fullyfedged type checker and interpreter, developed in Java (∼15k), and packaged with an extensive CLASS library of code and test suites (∼10k), including all the examples in this paper. Type checking is decidable in polynomial time, using a minimal type annotation, only on cut-bound names and function parameters, the multiplicative rules are handled by lazy context splitting (cf. [41]). The type checker ensures that corecursive calls are done on a session hereditarily descendent from the corecursion parameter, a condition motivated by our SN result (Theorem 3). But we also support an unsafe corecursion mode, in which this check is turned of, to type programs defned by general corecursion.

The type checker supports useful type inference and reconstruction abilities. The interpreter uses java.util.concurrent.\* package [53], using primitives such as fne-grained locks and condition variables to emulate the synchronous interactions of CLASS sessions and a cached thread pool to manage the life cycle of short-lived threads. Cell deallocation is implemented by reference counting, incremented on each share and decremented on each release. Forwarding redirects the clients of a shared cell through a chain of forwarding pointers (cf. [9]).

# 5 Related Work

Many resource-aware logics and type systems to tame shared state and interference have been proposed [3,18,57,77,44,17,60,61,24]. These systems adopt some form of linearity and/or afnity to resourceful programming [75,30] and to model failures/exceptions [28,59,20,36,52]. In CLASS, linearity allows us to control state sharing, whereas afnity is useful to ensure memory safety and to represent safely fnalizable or abortable computations. The hereditary session-discarding behaviour of afne sessions, modelled by rule [∧∨d], is also present in other works, e.g. [6,59,20].

CLASS builds on top of the PaT correspondence with Linear Logic [22,27,80], the logical principles for the state modalities being inspired by DiLL [35]. Recent works [43,9,10,7,50,64,67] also address the problem of sharing and nondeterminism in the setting of session-based PaT. In [67], reference cells may only store replicated sessions (of type !A), thus cannot refer to linear entities such as other cells or linear sessions, hence cannot represent many realistic programming idioms that CLASS does (see Section 4). Accommodating linear state in a pure PaT approach is thus addressed in this work with a novel, more fundamental approach. Furthermore, in [67], recursion is obtained via a system-F style encoding [79], which cannot model inductive stateful structures with updates in-place as we do with CLASS native inductive/coinductive types.

The take/put operations of CLASS relate with Concurrent Haskell MVars [45] and the acquire/release operations of the manifest sharing session-typed language SILL<sup>S</sup> [9,10]. Sharing in SILL<sup>S</sup> is based on shift modalities to move from shared to linear mode and back, and contraction principles to alias shared sessions. In CLASS we explore DiLL modalities and cocontraction principles [35] to express sharing of linear state and put / take protocols of mutex memory cells of invariant type. The work [10] ensures deadlock-freedom by relying on programmer provided partial orders on events [55,33,26], whereas in CLASS deadlock-freedom follows the same simple and general inductive argument of the corresponding result in e.g. [22], thanks to the logical character of the new proof rules (DiLL cocontraction, that enjoys cut-elimination). The work [64] introduces the language CSLL, by extending linear logic with coexponentials that support a notion of shared state, with a quite diferent approach than ours. CSLL does not claim the ability to naturally express shared linked data structures with update in-place and fne-grained locking, as CLASS does. Nevertheless, it is natural to defne in CLASS sessions exporting weakening, sharing and dereliction capabilities for linear behaviours, as in our shared bufer example.

Recently, the work [43] develops λlock, a substructural-typed λ-calculus with higher-order locks, which enjoys deadlock-freedom by imposing a set of high-level principles that guarantee acyclicity of the lock-sharing topologies, and which follow in CLASS as a consequence of its logical-motivated type system and DiLL's cocontraction. This work also extends λlocks with partial orders in which a resource can shared by more than two concurrent threads. None of the models in [43,9,10,64] addresses livelock absence or memory safety, as CLASS does.

As far as we are aware, CLASS is a frst proposal integrating shared state and recursion in a language based on PaT and Linear Logic, while guaranteeing strong normalisation. Least/greatest fxed points in Linear Logic were studied in [8], which inspired the development of recursion in [54,73], our treatment of recursion draws inspiration on [73]. Several works exploit the technique of logical relations to establish strong normalisation for concurrent process calculi [1,83,69,16,62]. The work [16] proves strong normalisation for a language with higher-order store with a type and efect system that stratifes memory into regions so as to preclude circularities. Interestingly, in CLASS such stratifcation is implicitly guaranteed by the acyclicity inherent to Linear Logic. Linear logical relations were studied in [62,21,72,74]. In this work we recast and extend the technique to Classical Linear Logic, exploring orthogonality [38,8,1], and demonstrate, using a specially devised technique of interference-sensitive reducibility, how logical relations scale to accommodate shared state.

# 6 Concluding Remarks

We have introduced CLASS, a session-based language founded on a propositionsas-types interpretation of Second-Order Classical Linear Logic, extended with recursion, afne types, frst-class mutex cells and shared linear state. We believe that CLASS is the frst proposal of a language of its kind to provide the following three strong properties by static typing: well-typed CLASS programs enjoy progress, hence never deadlock, do not leak memory and always terminate.

CLASS metatheoretical properties are obtained in a compositional and modular way, by leveraging the key features of propositions-as-types, from which the operational semantics and type system also emerges. In CLASS, types and process have a consistent proof-theoretical behaviour: typed program constructs correspond exactly to proof rules, with a proper compositional semantics via logical relations (Section 3). Programs are composed by plugging basic constructs with the cut rule, and all interaction principles are captured by principal cut reductions that act locally in proofs/type derivations (Def. 4). We also obtain an algebraic system based on proof simplifcation to reason about program (observational) equivalence, due to confuence (cf. [65]).

Besides the foundational relevance of our work, we also argued how CLASS can cleanly express realistic concurrent higher-order programming idioms, with many compelling examples. Any type system introduces conservative restrictions on its language, but we believe that CLASS ofers an interesting balance between the strong properties it ensures by typing and its expressiveness. In fact, we fnd CLASS type system helpful to guide the development of safe concurrent idioms, with a fairly light type annotation burden. As future work, we would like to investigate several possible refnements of the CLASS type discipline, namely, allowing fner-grained resource-access policies to be expressed, and exploring the integration of dependent and refnement types [71,51], enhancing the logical expressiveness of the basic type system.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Bunched Fuzz: Sensitivity for Vector Metrics

june wunder1() , Arthur Azevedo de Amorim<sup>3</sup> , Patrick Baillot<sup>2</sup> , and Marco Gaboardi<sup>1</sup>

<sup>1</sup> Boston University, Boston, USA

jwunder@bu.edu

<sup>2</sup> Univ. Lille, CNRS, Inria, Centrale Lille, UMR9189 CRIStAL, F-59000 Lille, France <sup>3</sup> Rochester Institute of Technology, Rochester, USA

Abstract. Program sensitivity measures the distance between the outputs of a program when run on two related inputs. This notion, which plays a key role in areas such as data privacy and optimization, has been the focus of several program analysis techniques introduced in recent years. Among the most successful ones, we can highlight type systems inspired by linear logic, as pioneered by Reed and Pierce in the Fuzz programming language. In Fuzz, each type is equipped with its own distance, and sensitivity analysis boils down to type checking. In particular, Fuzz features two product types, corresponding to two diferent notions of distance: the tensor product combines the distances of each component by adding them, while the with product takes their maximum.

In this work, we show that these products can be generalized to arbitrary L p distances, metrics that are often used in privacy and optimization. The original Fuzz products, tensor and with, correspond to the special cases L 1 and L <sup>∞</sup>. To ease the handling of such products, we extend the Fuzz type system with bunches—as in the logic of bunched implications—where the distances of diferent groups of variables can be combined using diferent L <sup>p</sup> distances. We show that our extension can be used to reason about quantitative properties of probabilistic programs.

# 1 Introduction

When developing a data-driven application, we often need to analyze its sensitivity, or robustness, a measure of how its outputs can be afected by varying its inputs. For example, to analyze the privacy guarantees of a program, we might consider what happens when we include the data of one individual in its inputs [11]. When analyzing the stability of a machine-learning algorithm, we might consider what happens when we modify one sample in the training set [7].

Such applications have spurred the development of several techniques to reason about program sensitivity [23,9]. One successful approach is based on linearlike [14] type systems, as pioneered in Reed and Pierce's Fuzz language [23].

The basic idea behind Fuzz is to use typing judgments to track the sensitivity of a program with respect to each variable. Each type comes equipped with a notion of distance, and the typing rules explain how to update variable sensitivities for each operation. Because diferent distances yield diferent sensitivity analyses, it is often useful to endow a set of values with diferent distances, which leads to diferent Fuzz types. For example, like linear logic, Fuzz has two notions of products: the tensor product ⊗ and the Cartesian product & (with). The frst one is equipped with the L 1 (or Manhattan) distance, where the distance between two pairs is computed by adding the distances between the corresponding components. The second one is equipped with the L<sup>∞</sup> (or Chebyshev) distance, where the component distances are combined by taking their maximum.

The reason for focusing on these two product types is that they play a key role in diferential privacy [11], a rigorous notion of privacy that was the motivating application behind the original Fuzz design. However, we could also consider equipping pairs with more general L <sup>p</sup> distances, which interpolate between the L <sup>1</sup> and L<sup>∞</sup> and are extensively used in convex optimization [8], information theory [10] and statistics [15]. Indeed, other type systems for diferential privacy inspired by Fuzz [20] include types for vectors and matrices under the L <sup>2</sup> distance, which are required to use the Gaussian mechanism, one of the popular building blocks of diferential privacy. Supporting more general L <sup>p</sup> metrics would allow us to capture even more such building blocks [17,1], which would enable further exploration of the tradeofs between diferential privacy and accuracy.

In this paper, we extend these approaches and show that Fuzz can be enriched with a family of tensor products ⊗p, for 1 ≤ p ≤ ∞. These tensor products are equipped with the L <sup>p</sup> distance, the original Fuzz products <sup>⊗</sup> and & corresponding to the special cases ⊗<sup>1</sup> and ⊗∞. Moreover, each connective ⊗<sup>p</sup> is equipped with a corresponding "linear implication" ⊸p, unlike previous related systems where such an implication only exists for p = 1. Following prior work [4,3], we give to our extension a semantics in terms of non-expansive functions, except that the presence of the implications ⊸<sup>p</sup> forces us to equip input and output spaces with more general distances where the triangle inequality need not hold.

A novelty of our approach is that, to support the handling of such products, we generalize Fuzz environments to bunches, where each L <sup>p</sup> distance comes with its own context former. Thus, we call our type system Bunched Fuzz. This system, inspired by languages derived from the logic of Bunched Implications (BI) [22] (e.g. [21]), highlights diferences between the original Fuzz design and linear logic—for example, products distribute over sums in Fuzz and BI, but not in linear logic. While similar indexed products and function spaces have also appeared in the literature, particularly in works on categorical grammars [19], here they are employed to reason about vector distances and function sensitivity.

While designing Bunched Fuzz, one of our goals was to use sensitivity to reason about randomized algorithms. In the original Fuzz, probability distributions are equipped with the max divergence distance, which can be used to state differential privacy as a sensitivity property [23]. Subsequent work has shown how Fuzz can also accommodate other distances over probability distributions [3]. However, such additions required variants of graded monads, which express the distance between distributions using indices (i.e. grades) on the monadic type of distributions over their results, as opposed to sensitivity indices on their inputs, as it was done in the original Fuzz. In particular, this makes it more difcult to reason about distances separately with respect to each input. Thanks to bunches, however, we can incorporate these composition principles more naturally. For example, Bunched Fuzz can reason about the Hellinger distance on distributions without the need for output grading, as was done in prior systems [3].

We will also see that, by allowing arbitrary L <sup>p</sup> norms, we can generalize prior case studies that were verifed in Fuzz and obtain more general methods for reasoning about diferential privacy (Section 5). Consider the L <sup>p</sup> mechanism [1,17], which adds noise to the result of a query whose sensitivity is measured in the L <sup>p</sup> norm. Since Fuzz does not have the means to analyze such a sensitivity measure, it cannot implement the L <sup>p</sup> mechanism; Bunched Fuzz, however, can analyze such a measure, and thus allows for a simple implementation in terms of the exponential mechanism. Such a mechanism, in turn, can be used to implement a variant of a gradient descent algorithm that works under the L <sup>p</sup> norm, generalizing an earlier version that was biased towards the L <sup>1</sup> norm [25]. Summarizing, our contributions are:


Check the full version of this paper for more technical details [26].

# 2 Background

#### 2.1 Metrics and Sensitivity

To discuss sensitivity, we frst need a notion of distance. We call extended pseudosemimetric space a pair X = (|X|, dX) consisting of a carrier set |X| and an extended pseudosemimetric d<sup>X</sup> : |X| <sup>2</sup> <sup>→</sup> <sup>R</sup> ≥0 <sup>∞</sup> , which is a function satisfying, for all x, y ∈ |X|, dX(x, x) = 0 and dX(x, y) = dX(y, x). This relaxes the standard notion of metric space in a few respects. First, the distance between two points can be infnite, hence the extended. Second, diferent points can be at distance zero, hence the pseudo. Finally, we do not require the triangular inequality:

$$d\_X(x, y) \le d\_X(x, z) + d\_X(z, y),\tag{1}$$

hence the semi. We focus on extended pseudosemimetrics because they support constructions that true metrics do not. In particular, they make it possible to scale the distance of a space by ∞ and enable more general function spaces. However, to simplify the terminology, we will drop the "extended pseudosemi" prefx in the rest of the paper, and speak solely of metric spaces. In some occasions, we might speak of a proper metric space, by which we mean a space where the triangle inequality does hold (but not necessarily the other two requirements that are missing compared to the traditional defnition of metric space).

Given a function f : X → Y on metric spaces, we say that it is s-sensitive, for s in R ≥0 <sup>∞</sup> , if we have, for all x1, x<sup>2</sup> ∈ X, d<sup>Y</sup> (f(x1), f(x2)) ≤ s · dX(x1, x2), where we extend addition and multiplication to R ≥0 <sup>∞</sup> by setting ∞ · s = s · ∞ = ∞. We also say that f is s-Lipschitz continuous, though the traditional defnition of Lipschitz continuity assumes s ̸= ∞. If a function is s-sensitive, then it is also s ′ -sensitive for every s ′ <sup>≥</sup> <sup>s</sup>. Every function of type <sup>X</sup> <sup>→</sup> <sup>Y</sup> is <sup>∞</sup>-sensitive. If a function is 1-sensitive, we also say that f is non-expansive. We use X ⊸ Y to denote the set of such non-expansive functions. The identity function is always non-expansive, and non-expansive functions are closed under composition. Thus, metric spaces and non-expansive functions form a category, denoted Met.

#### 2.2 Distances for Diferential Privacy

Among many applications, sensitivity is a useful notion because it provides a convenient language for analyzing the privacy guarantees of algorithms—specifcally, in the framework of diferential privacy [11]. Diferential privacy is a technique for protecting the privacy of individuals in a database by blurring the results of a query to the database with random noise. The noise is calibrated so that each individual has a small infuence on the probability of observing each outcome (while ideally guaranteeing that the result of the query is still useful).

Formally, suppose that we have some set of databases db equipped with a metric. This metric roughly measures how many rows difer between two databases, though the exact defnition can vary. Let f : db → DX be a randomized database query, which maps a database to a discrete probability distribution over the set of outcomes X. We say that f is ϵ-diferentially private if it is an ϵ-sensitive function from db to DX, where the set of distributions DX is equipped with the following distance, sometimes known as the max divergence:

$$\text{MD}\_X(\mu\_1, \mu\_2) = \sum\_{x \in X} \ln \left| \frac{\mu\_1(x)}{\mu\_2(x)} \right| \,. \tag{2}$$

(Here, we stipulate that ln |0/0| = 0 and ln |p/0| = ln |0/p| = ∞ for p ̸= 0.)

To understand this defnition, suppose that D<sup>1</sup> and D<sup>2</sup> are two databases at distance 1—for instance, because they difer with respect to the data of a single individual. If f is ϵ-diferentially private, the above defnition implies that f(D1) and f(D2) are at most ϵ apart. When ϵ is large, the probabilities of each outcome in the result distributions can vary widely. This means that, by simply observing one output of f, we might be able to guess with good confdence which of the databases D<sup>1</sup> or D<sup>2</sup> was used to produce that output. Conversely, if ϵ is small, it is hard to tell which database was used because the output probabilities will be close. For this reason, it is common to view ϵ as a privacy loss—the larger it is, the more privacy we are giving up to reveal the output of f.

Besides providing a strong privacy guarantee, this formulation of closeness for distributions provides two important properties. First, we can compose differentially private algorithms without ruining their privacy guarantee. Note that DX forms a monad, where the return and bind operations are given as follows:

$$\eta(x) = y \mapsto \begin{cases} 1 & \text{if } x = y \\ 0 & \text{otherwise} \end{cases} \tag{3}$$

$$f^\dagger(\mu) = y \mapsto \sum\_{x \in X} \mu(x) \cdot f(x)(y). \tag{4}$$

Intuitively, the return operation produces a deterministic distribution, whereas bind samples an element x from µ and computes f(x). When composing diferentially private algorithms, their privacy loss can be soundly added together:

Theorem 1. Suppose that f : db → DX is ϵ1-diferentially private and that g : db → X → DY is such that the mapping δ → g(δ)(x) is ϵ2-diferentially private for every x. Then the composite h : db → DY defned as h(δ) = g(δ) † (f(δ)) is (ϵ<sup>1</sup> + ϵ2)-diferentially private.

The other reason why the privacy metric is useful is that it supports many building blocks for diferential privacy. Of particular interest is the Laplace mechanism, which blurs a numeric result with noise drawn from the two-sided Laplace distribution. If <sup>x</sup> <sup>∈</sup> <sup>R</sup>, let <sup>L</sup>(x) be the distribution with density<sup>4</sup> <sup>y</sup> 7→ <sup>1</sup> 2 e −|x−y| .

Theorem 2. The mechanism <sup>L</sup> is a non-expansive function of type <sup>R</sup> <sup>→</sup> <sup>D</sup>R. 5

Thus, to defne an ϵ-diferentially private numeric query on a database, it sufces to defne an ϵ-sensitive, deterministic numeric query, and then blur its result with Laplace noise. Diferential privacy follows from the composition principles for sensitivity. This reasoning is justifed by the fact that the Laplace mechanism adds noise proportional to the sensitivity of the numeric query in L <sup>1</sup> distance.

#### 2.3 Sensitivity as a Resource

Because diferential privacy is a sensitivity property, techniques for analyzing the sensitivity of programs can also be used to analyze their privacy guarantees. One particularly successful approach in this space is rooted in type systems inspired by linear logic, as pioneered by Reed and Pierce in the Fuzz programming language [16,23]. At its core, Fuzz is just a type system for tracking sensitivity.

<sup>4</sup> We use here a Laplace distribution with scale 1.

<sup>5</sup> The defnitions do not quite match up our setting, since L is a continuous, and not discrete distribution. The result can be put on frm footing by working with a discretized version of the Laplace distribution [12].

Typing judgments are similar to common functional programming languages, but variable declarations are of the form x<sup>i</sup> :ri τi : x<sup>1</sup> :r<sup>1</sup> τ1, . . . , x<sup>n</sup> :r<sup>n</sup> τ<sup>n</sup> ⊢ e : σ. The annotations <sup>r</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup> ≥0 <sup>∞</sup> are sensitivity indices, whose purpose is to track the efect that changes to the program input can have on its output: if we have two substitutions γ and γ ′ for the variables x<sup>i</sup> , then the metric preservation property of the Fuzz type system guarantees that

$$d(e[\gamma/\vec{x}], e[\gamma'/\vec{x}]) \le \sum\_i r\_i \cdot d(\gamma(x\_i), \gamma'(x\_i)),\tag{5}$$

where the metrics d are computed based on the type of each expression and value. This means that we can bound the distance on the results of the two runs of e by adding up the distances of the inputs scaled by their corresponding sensitivities. When this bound is fnite, the defnition of the metrics guarantees that the two runs have the same termination behavior. When r<sup>i</sup> = ∞, the above inequality provides no guarantees if the value of x<sup>i</sup> varies.

Fuzz includes data types commonly found in functional programming languages, such as numbers, products, tagged unions, recursive types and functions. The typing rules of the language explain how the sensitivities of each variable must be updated to compute each operation. The simplest typing rule says that, in order to use a variable, its declared sensitivity must be greater than 1:

$$\frac{r \ge 1}{\Gamma, x:\_r \tau, \Delta \vdash x: \tau}$$

As a more interesting example, to construct a pair (e1, e2), the following rule says that we need to add the sensitivities of the corresponding contexts:

$$\frac{\Gamma\_1 \vdash e\_1 : \tau\_1 \qquad \Gamma\_2 \vdash e\_2 : \tau\_2}{\Gamma\_1 + \Gamma\_2 \vdash (e\_1, e\_2) : \tau\_1 \otimes \tau\_2}.$$

This behavior is a result of the distance of the tensor type ⊗: the distance between two pairs in τ<sup>1</sup> ⊗ τ<sup>2</sup> is the result of adding the distances between the frst and second components; therefore, the sensitivity of each variable for the entire expression is the sum of the sensitivities for each component. In this sense, sensitivities in Fuzz behave like a resource that must be distributed across all variable uses in a program. For the sake of analogy, we might compare this treatment to how fractional permissions work in separation logic: the predicate l 7→<sup>q</sup> x indicates that we own a fraction q ∈ [0, 1] of a resource stating that l points to x. If q = q<sup>1</sup> + q2, we can split this predicate as l 7→<sup>q</sup><sup>1</sup> x ∗ l 7→<sup>q</sup><sup>2</sup> x, allowing us to distribute this resource between diferent threads.

The distance on ⊗ corresponds to the sum in the upper bound in the statement of metric preservation (Equation (5)). This distance is useful because it is the one that yields good composition principles for diferential privacy. This can be seen in the typing rule for sampling from a probabilistic distribution:

$$\frac{\Gamma \vdash e\_1 : \bigcirc \tau \qquad \Delta, x ::\_r \tau \vdash e\_2 : \bigcirc \sigma}{\Gamma + \Delta \vdash \text{mlet } x = e\_1 \text{ in } e\_2 : \bigcirc \sigma}$$

Here, ⃝τ denotes the type of probability distributions over values of type τ . This operation samples a value x from the distribution e<sup>1</sup> and uses this value to compute the distribution e2. We can justify the soundness of this rule by reducing it to Theorem 1: the addition on contexts corresponds to the fact that the privacy loss of a program degrades linearly under composition.

Besides the tensor product ⊗, Fuzz also features a with product &, where the distances between components are combined by taking their maximum. This leads to a diferent typing rule for & pairs, which does not add up the sensitivities:

$$\frac{\Gamma \vdash e\_1 : \tau\_1 \qquad \Gamma \vdash e\_2 : \tau\_2}{\Gamma \vdash (e\_1, e\_2) : \tau\_1 \& \tau\_2}$$

If we compare these rules for pairs, we see a clear analogy with linear logic: ⊗ requires us to combine contexts, whereas & allows us to share them. Fuzz's elimination rules for products continue to borrow from linear logic: deconstructing a tensor gives both elements but deconstructing a with product returns only one.

$$\frac{\begin{array}{c} \Gamma \vdash e : \tau\_1 \otimes \tau\_2 \qquad \Delta, x :\_r \tau\_1, y :\_r \tau\_2 \vdash e' : \tau' \\ \hline \Delta + r\Gamma \vdash \textbf{let} \ (x, y) = e \text{ in } e' : \tau' \end{array}}{\begin{array}{c} \Gamma \vdash e : \tau\_1 \otimes \tau\_2 \end{array}} \qquad \frac{\begin{array}{c} \Gamma \vdash e : \tau\_1 \otimes \tau\_2 \end{array}}{\begin{array}{c} \Gamma \vdash \pi\_i \ e : \tau\_i \end{array}}$$

This partly explains why the connectives' distances involve addition and maximum. When using a tensor product, both elements can afect how much the output can vary, so both elements must be considered. (Note that Fuzz is an afne type system: we are free to ignore one of the product's components, and thus we can write projection functions out of a tensor product.) When projecting out of a with product, only one of the elements will afect the program's output, so we only need to consider the component that yields the maximum distance.

Fuzz uses the !<sup>s</sup> type for managing sensitivities. Intuitively, !sτ behaves like τ , but with the distances scaled by s; when s = ∞, this means that diferent points are infnitely apart. The introduction rule scales the sensitivities of variables in the environment. This can be used in conjunction with the elimination rule to propagate the sensitivity out of the type and into the environment.

$$\frac{\Gamma \vdash e : \tau}{s\Gamma \vdash !e : !\_{s}\tau} \qquad \qquad \frac{\Gamma \vdash e : !\_{s}\tau \qquad \Delta, x :\_{rs}\tau \vdash e' : \tau'}{\Delta + r\Gamma \vdash \textbf{let} \; !x = e \; \textbf{in} \; e' : \tau'}$$

Finally, the rules for the linear implication ⊸ are similar to the ones from linear logic, but adjusted to account for sensitivities.

$$\frac{\Gamma, x:\_1\tau \vdash e:\sigma}{\Gamma \vdash \lambda x.e:\tau \multimap \sigma} \qquad\qquad \frac{\Gamma \vdash e:\tau \multimap \sigma \qquad \Delta \vdash e':\tau}{\Gamma + \Delta \vdash e \, e':\sigma}$$

To introduce the linear implication ⊸, the bound variable needs to have sensitivity 1. When eliminating ⊸, the environments need to be added. In categorical language, addition, which is also present in the metric for ⊗, is connected to the fact that there is an adjunction between the functors <sup>X</sup> <sup>⊗</sup> (−) and <sup>X</sup> <sup>⊸</sup> (−).

#### 2.4 L<sup>p</sup> distances

The L <sup>1</sup> and L<sup>∞</sup> distances are instances of a more general family of L <sup>p</sup> distances (for <sup>p</sup> <sup>∈</sup> <sup>R</sup> ≥1 <sup>∞</sup> ).<sup>6</sup> Given a sequence of distances ⃗x = (x1, . . . , xn) <sup>∈</sup> (<sup>R</sup> ≥0 <sup>∞</sup> ) <sup>n</sup>, we frst defne the L <sup>p</sup> pseudonorm<sup>7</sup> as follows: ||⃗x||<sup>p</sup> = (Σ<sup>n</sup> <sup>i</sup>=1x p i ) <sup>1</sup>/p. This defnition makes sense whenever the distances x<sup>i</sup> and p are fnite. When p = ∞, we defne the right-hand side as the limit max<sup>n</sup> <sup>i</sup>=1 x<sup>i</sup> . When x<sup>i</sup> = ∞ for some i, we defne the right-hand side as ∞. We have the following classical properties:

Proposition 1 (H¨older inequality). For all p, q <sup>≥</sup> <sup>1</sup> such that <sup>1</sup> <sup>p</sup> + 1 <sup>q</sup> = 1, and for all ⃗x, ⃗y <sup>∈</sup> (<sup>R</sup> ≥0 <sup>∞</sup> ) <sup>n</sup>, we have: Σ<sup>n</sup> <sup>i</sup>=1xiy<sup>i</sup> ≤ ||⃗x||p||⃗y||q. For p = 2, q = 2, this is the Cauchy-Schwarz inequality: Σ<sup>n</sup> <sup>i</sup>=1xiy<sup>i</sup> ≤ ||⃗x||2||⃗y||2.

Proposition 2. For <sup>1</sup> <sup>≤</sup> <sup>p</sup> <sup>≤</sup> <sup>q</sup> we have, for ⃗x <sup>∈</sup> (<sup>R</sup> ≥0 <sup>∞</sup> ) n:

$$||\vec{x}||\_q \le ||\vec{x}||\_p \tag{6}$$

$$||\vec{x}||\_p \le n^{\frac{1}{p} - \frac{1}{q}} ||\vec{x}||\_q \tag{7}$$

$$||\vec{x}||\_2 \le ||\vec{x}||\_1 \le \sqrt{n} \ ||\vec{x}||\_2 \tag{8}$$

The L <sup>p</sup> pseudonorms yield distances on tuples. More precisely, suppose that (Xi)1≤i≤<sup>n</sup> are metric spaces. The following defnes a metric on X = X1×· · ·×Xn:

$$d\_p(\vec{x}, \vec{x}') = ||(d\_{X\_1}(x\_1, x\_1'), \dots, d\_{X\_n}(x\_n, x\_n'))||\_p$$

Proposition 3. For <sup>1</sup> <sup>≤</sup> <sup>p</sup> <sup>≤</sup> <sup>q</sup> we have, for ⃗x, x⃗′ <sup>∈</sup> <sup>X</sup><sup>1</sup> × · · · × <sup>X</sup>n:

$$d\_q(\vec{x}, \vec{x'}) \le d\_p(\vec{x}, \vec{x'}) \le n^{\frac{1}{p} - \frac{1}{q}} d\_q(\vec{x}, \vec{x'}) \tag{9}$$

$$d\_2(\vec{x}, \vec{x'}) \le d\_1(\vec{x}, \vec{x'}) \le \sqrt{n} \; d\_2(\vec{x}, \vec{x'}) \tag{10}$$

# 3 Bunched Fuzz: Programming with L<sup>p</sup> Distances

As we discussed earlier, the L <sup>1</sup> distance is not the only distance on products with useful applications. In the context of diferential privacy, for example, the L <sup>2</sup> distance is used to measure the sensitivity of queries when employing the Gaussian mechanism, a method for private data release that sanitizes data by adding Gaussian noise instead of Laplacian noise.<sup>8</sup>

It is possible to extend a Fuzz-like analysis with L <sup>2</sup> distances by adding primitive types and combinators for vectors. This was done, for instance, in

<sup>6</sup> The L <sup>p</sup> distances can be defned with p ≥ 0 but for simplicity of our treatment we will only consider p ≥ 1.

<sup>7</sup> "pseudo-" because it can be infnite.

<sup>8</sup> Technically, the Gaussian mechanism is used to achieve a relaxation of diferential privacy known as approximate, or (ϵ, δ)-diferential privacy. Though this notion cannot be analyzed directly by classical verifcation techniques for diferential privacy, it can be handled by recent extensions of Fuzz [3,20].

the Duet language [20], which provides the Gaussian mechanism as one of the primitives for diferential privacy. Such an extension can help verify a wide class of algorithms that manipulate vectors in a homogeneous fashion, but it makes it awkward to express programs that require fner grained access to vectors.

To illustrate this point, suppose that we have a non-expansive function f : R <sup>2</sup> <sup>→</sup> <sup>R</sup>, where the domain carries the <sup>L</sup> <sup>2</sup> metric. Consider the mapping

$$g(x, y) = f(2x, y) + f(2y, x).$$

How would we analyze the sensitivity of g? We cannot translate such a program directly into a system like Duet, since it does not allow us to manipulate L 2 vectors at the level of individual components. However, we could rewrite the defnition of g to use matrix operations, which could be easily incorporated in a variant of Duet. Specifcally, consider the following defnition:

$$g(\vec{x}) = f\left(\begin{bmatrix} 2 \ 0 \\ 0 \ 1 \end{bmatrix} \vec{x}\right) + f\left(\begin{bmatrix} 0 \ 2 \\ 1 \ 0 \end{bmatrix} \vec{x}\right) \cdot \vec{x}$$

The L 2 sensitivity of a linear transformation ⃗x 7→ M⃗x can be easily computed if we know the coefcients of the matrix M. Note that

$$d(M\vec{x}, M\vec{y}) = ||M\vec{x} - M\vec{y}||\_2 = ||M(\vec{x} - \vec{y})||\_2 = \frac{||M(\vec{x} - \vec{y})||\_2}{||\vec{x} - \vec{y}||\_2} ||\vec{x} - \vec{y}||\_2.$$

$$\leq \left(\sup\_{\vec{z}} \frac{||M\vec{z}||\_2}{||\vec{z}||\_2}\right) d(\vec{x}, \vec{y}).$$

The quantity sup⃗z ||M⃗z||2/||⃗z||2, known as the operator norm of M, gives the precise sensitivity of the above operation, and can be computed by standard algorithms from linear algebra. In the case of g, both matrices have a norm of 2. This means that we can analyze the sensitivity of g compositionally, as in Fuzz: addition is 1-sensitive in each variable, so we just have to sum the sensitivities of ⃗x in each argument, yielding a combined sensitivity of 4. Unfortunately, this method of combining the sensitivities of each argument is too coarse when reasoning with L <sup>p</sup> distances, which leads to an imprecise analysis. To obtain a

better bound, we can reason informally as follows. First, take M = " 2 0 0 1 0 1 2 0#<sup>T</sup>

.

We can compute the operator norm of M directly:

$$||M|| = \sup\_{x,y} \frac{\sqrt{2^2x^2 + y^2 + 2^2y^2 + x^2}}{\sqrt{x^2 + y^2}} = \sup\_{x,y} \frac{\sqrt{5(x^2 + y^2)}}{\sqrt{x^2 + y^2}} = \sqrt{5},$$

which implies that <sup>M</sup> is a <sup>√</sup> 5-sensitive function of type R <sup>2</sup> <sup>→</sup> <sup>R</sup> <sup>4</sup> ∼= R <sup>2</sup> <sup>×</sup> <sup>R</sup> 2 . Moreover, thanks to Proposition 3, we can view addition (+) as a <sup>√</sup> 2-sensitive operator of type R <sup>2</sup> <sup>→</sup> <sup>R</sup>, since

$$d\_{\mathbb{R}}(x\_1 + x\_2, y\_1 + y\_2) \le d\_{\mathbb{R}}(x\_1 - y\_1) + d\_{\mathbb{R}}(x\_2 - y\_2) = d\_1(\vec{x}, \vec{y}) \le \sqrt{2}d\_2(\vec{x}, \vec{y}).$$

$$\begin{split} \left| \tau, \sigma, \rho ::= 1 \mid \mathbb{R} \mid !\_{s} \tau \mid \bigcirc\_{P} \tau \mid \bigcirc\_{H} \tau \mid \tau \multimap\_{p} \sigma \mid \tau \otimes\_{p} \sigma \mid \tau \oplus \sigma \\\ e ::= x \mid r \in \mathbb{R} \mid () \mid \lambda x.e \mid e \, e \mid (e,e) \mid \mathsf{let} \,(x,y) = e \, \mathsf{in} \, e \\\ \left| \ \mathsf{inj}\_{i} e \mid (\mathsf{case}\ e \,\mathsf{of}\,x.\,e \mid y.\,e) \mid !e \mid \mathsf{let}\,\,!x = e \, \mathsf{in} \, e \\\ \mid \ \mathsf{mlet}\,\,x = e \, \mathsf{in} \, e \mid \mathsf{return}\,e \mid \cdot \end{split}$$

Fig. 1. Types and terms in Bunched Fuzz

Thus, by rewriting the defnition of <sup>g</sup> as (+) ◦ (<sup>f</sup> <sup>×</sup> <sup>f</sup>) ◦ <sup>M</sup>, where <sup>f</sup> <sup>×</sup> <sup>f</sup> : <sup>R</sup> <sup>4</sup> ∼= R <sup>2</sup> <sup>×</sup> <sup>R</sup> <sup>2</sup> <sup>→</sup> <sup>R</sup> <sup>×</sup> <sup>R</sup> denotes the application of <sup>f</sup> in parallel, we can compute the sensitivity of <sup>g</sup> by multiplying the sensitivity of each stage, as <sup>√</sup> 2 × 1 × √ √ 5 = 10 ≈ 3.16, which is strictly better than the previous bound.

Naturally, we could further extend Fuzz or Duet with primitives for internalizing this reasoning, but it would be preferable to use the original defnition of g and automate the low-level reasoning about distances. In this section, we demonstrate how this can be done via Bunched Fuzz, a language that refnes Fuzz by incorporating more general distances in its typing environments. Rather assuming that input distances are always combined by addition, or the L <sup>1</sup> distance, Bunched Fuzz allows them to be combined with arbitrary L <sup>p</sup> distances. This refnement allows us to analyze diferent components of a vector as individual variables, but also to split the sensitivity of these variables while accounting for their corresponding vector distances. In the remaining of this section, we present the syntax and type system of Bunched Fuzz, highlighting the main diferences with respect to the original Fuzz design. Later, in Section 4, we will give a semantics to this language in terms of metric spaces, following prior work [3].

Types and Terms Figure 1 presents the grammar of types and the main term formers of Bunched Fuzz. They are similar to their Fuzz counterparts; in particular, there are types for real numbers, products, sums, functions, and a unit type. The main novelty is in the product type τ ⊗<sup>p</sup> σ, which combines the metrics of each component using the L <sup>p</sup> distance (cf. Section 2.4). The types <sup>τ</sup> <sup>⊗</sup><sup>1</sup> <sup>σ</sup> and τ ⊗<sup>∞</sup> σ subsume the types τ ⊗ σ and τ & σ in the original Fuzz language. Note that there is no term constructor or destructor for the Fuzz type &, since it is subsumed by <sup>⊗</sup>∞. The type <sup>τ</sup> <sup>⊸</sup><sup>p</sup> <sup>σ</sup> represents non-expansive functions endowed with a metric that is compatible with the L <sup>p</sup> metric, in that currying works (cf. Section 5). We will sometimes write <sup>⊗</sup> for <sup>⊗</sup><sup>1</sup> and <sup>⊸</sup> for <sup>⊸</sup>1.

Another novelty with respect to Fuzz is that there are two constructors for probability distributions, ⃝<sup>P</sup> and ⃝H. The frst one carries the original Fuzz privacy metric, while the second one carries the Hellinger distance. As we will see shortly, the composition principle for the Hellinger distance uses a contraction operator for the L <sup>2</sup> distance, which was not available in the original Fuzz design. Both distribution types feature term constructors mlet and return for sampling from a distribution and for injecting values into distributions. To simplify the notation, we do not use separate versions of these term formers for each type.

Bunches Before describing its type system, we need to talk about how typing environments are handled in Bunched Fuzz. In the spirit of bunched logics, environments are bunches defned with the following grammar:

$$I, \Delta ::= \cdot \mid [x:\tau]\_s \mid \varGamma \,, \_p \Delta$$

The empty environment is denoted as ·. The form [x : τ ]<sup>s</sup> states that the variable x has type τ and sensitivity s. The form Γ ,<sup>p</sup> ∆ denotes the concatenation of Γ and ∆, which is only defned when the two bind disjoint sets of variables. As we will see in Section 4, bunches will be interpreted as metric spaces, and the p index denote which L <sup>p</sup> metric we will use to combine the metrics of Γ and ∆.

The type system features several operations and relations on bunches, which are summarized in Figure 2. We write Γ ↭ Γ ′ to indicate that we can obtain Γ ′ by rearranging commas up to associativity and commutativity, and by treating the empty environment as an identity element; Figure 2 has a precise defnition. Observe that associativity only holds for equal values of p. This operation will be used to state a permutation rule for the type system of Bunched Fuzz.

Like in Fuzz, environments have a scaling operation sΓ which scales all sensitivities in the bunch by s. For example,

$$s([x:\tau]\_{r\_1},\_p[y:\sigma]\_{r\_2}) = ([x:\tau]\_{s\cdot r\_1},\_p[y:\sigma]\_{s\cdot r\_2}).$$

The exact defnition of scaling in such graded languages is subtle, since minor variations can quickly lead to unsoundness. The defnition we are using (∞ · 0 = 0 · ∞ = ∞), which goes back to prior work [3], is sound, but imprecise, since it leads to too many variables being marked as ∞-sensitive. It would also be possible to have a more precise variant that uses a non-commutative defnition of multiplication on distances [4], but we keep the current formulation for simplicity. (For a more thorough discussion on these choices and their tradeofs, see the "Zero and Infnity" example in Appendix B of the full version [26] of this paper.)

In the original Fuzz type system, rules with several premises usually have their environments combined by adding sensitivities pointwise, which corresponds to a use of the L <sup>1</sup> metric. In Bunched Fuzz, we have instead a family of contraction operations Contr(p, Γ, ∆) for combining environments, one for each L <sup>p</sup> metric. Contraction only makes sense if Γ and ∆ difer only in sensitivities and variable names, but have the same structure otherwise. We write this relation as Γ ≈ ∆. When contracting two leaves, sensitivities are combined using the L <sup>p</sup> norm, while keeping variable names from the left bunch.

Unlike Fuzz, where contraction is implicit in rules with multiple premises, Bunched Fuzz has a separate, explicit contraction typing rule. The rule will be stated using the vars function, which lists all variables in a bunch.

Type System Our type system is similar to the one of Fuzz, but adapted to use bunched environments. The typing rules are displayed on Figure 3. For example, in the ⊗I rule, notice that the p on the tensor type is carried over to the bunch in the resulting environment. Similarly, in the ⊸I rule, the value of p that annotates the bunch in the premise is carried over to the ⊸<sup>p</sup> in the conclusion.

vars(·) = [] vars([x : τ ]s) = [x] vars((Γ1,<sup>p</sup> Γ2)) = vars(Γ1) ++ vars(Γ2) · ≈ · [x : τ ]<sup>s</sup> ≈ [y : σ]<sup>r</sup> if τ = σ Γ<sup>1</sup> ,<sup>p</sup> Γ<sup>2</sup> ≈ ∆1,<sup>q</sup> ∆<sup>2</sup> if p = q ∧ Γ<sup>i</sup> ≈ ∆<sup>i</sup>

$$\begin{array}{llll}\Gamma \leadsto \Delta & \text{if } \Gamma = \Delta \\ \Gamma \leadsto \cdot\_{p} \Delta & \text{if } \Gamma \leadsto \Delta \\ \Gamma \leadsto \Delta\_{p} \cdot & \text{if } \Gamma \leadsto \Delta \\ \Gamma\_{1,p} \varGamma\_{2} \leadsto \Delta\_{1,p} \Delta\_{2} & \text{if } \Gamma\_{i} \leadsto \Delta\_{i} \\ \Gamma\_{1,p} \varGamma\_{2} \leadsto \Delta\_{2,p} \Delta\_{1} & \text{if } \Gamma\_{i} \leadsto \Delta\_{i} \\ \Gamma\_{1,p} (\Gamma\_{2,p} \varGamma\_{3}) \leadsto (\Delta\_{1,p} \Delta\_{2})\_{p} \Delta\_{3} & \text{if } \Gamma\_{i} \leadsto \Delta\_{i} \\ \Gamma\_{2} \leadsto \Gamma\_{1} & \text{if } \Gamma\_{1} \leadsto \Gamma\_{2} \\ \end{array} \end{array} \quad \begin{array}{ll} s \dashrightarrow \Delta \\ \Delta \\ s \dashrightarrow \Delta \\ \Delta\_{1} \\ \end{array}$$

$$c(p,q) = \begin{cases} 1 & \text{if } p = \infty\\ 2^{\left\lfloor \frac{1}{q} - \frac{1}{p} \right\rfloor} & \text{otherwise} \end{cases}$$

$$\begin{aligned} \operatorname{Contr}(p, \cdot, \cdot) &= \cdot \\ \operatorname{Contr}(p, [x : \tau]\_s, [y : \tau]\_r) &= [x : \tau]\_q \, \_{\overline{\mathbb{F}^{\mathfrak{s}^p + r^p}}} \\ \operatorname{Contr}(p, (\Gamma\_1, \_q\Gamma\_2), (\Delta\_{1,q}, \Delta\_2)) &= c(p, q) (\operatorname{Contr}(p, \Gamma\_1, \Delta\_1)\_q \, \_q \operatorname{Contr}(p, \Gamma\_2, \Delta\_2)). \end{aligned}$$

#### Fig. 2. Bunch Operations

Like in Fuzz, the !E rule propagates the scaling factor, but using the bunch structure. Rather than adding the two environments, we splice one into the other: the notation Γ(∆) denotes a compound bunch where we plug in the bunch ∆ into another bunch Γ(⋆) that has a single, distinguished hole ⋆. As we mentioned earlier, Bunched Fuzz has an explicit typing rule for contraction, whereas contraction in Fuzz is implicit in rules with multiple premises. Note also that we have unrestricted weakening. Finally, we have the rules for typing the return and bind primitives of the probabilistic types ⃝<sup>H</sup> and ⃝<sup>P</sup> . Those for ⃝<sup>P</sup> are adapted from Fuzz, by using contraction instead of adding up the environments. The ones for ⃝<sup>H</sup> are similar, but use L 2 contraction instead, since that is the metric that enables composition for the Hellinger distance.

Let us now explain in which sense ⊗<sup>∞</sup> corresponds to the & connective of Fuzz. We will need the following lemma:

Lemma 1 (Renaming). Assume that there is a type derivation of Γ ⊢ e : τ and that Γ ≈ Γ ′ . Then there exists a derivation of Γ ′ <sup>⊢</sup> <sup>e</sup>[vars(<sup>Γ</sup> ′ )/vars(Γ)] : τ .

Now, the & connective in Fuzz supports two operations, projections and pairing. The connective ⊗<sup>∞</sup> of Bunched Fuzz also supports these operations, but as derived forms. First, projections can be encoded by defning πi(e) for i = 1, 2 as let (x1, x2) = e in x<sup>i</sup> . Second, for pairing assume we have two derivations of Γ ⊢ e<sup>i</sup> : σ<sup>i</sup> for i = 1, 2, and let Γ ′ be an environment obtained from Γ by

s ≥ 1 [x : τ ]<sup>s</sup> ⊢ x : τ Axiom · ⊢ r : R RI · ⊢ () : 1 1I Γ ,<sup>p</sup> [x : τ ]<sup>1</sup> ⊢ e : σ Γ ⊢ λx.e : τ ⊸<sup>p</sup> σ ⊸I Γ ⊢ f : τ ⊸<sup>p</sup> σ ∆ ⊢ e : τ Γ,<sup>p</sup> ∆ ⊢ f e : σ ⊸E Γ ⊢ e<sup>1</sup> : τ ∆ ⊢ e<sup>2</sup> : σ Γ ,<sup>p</sup> ∆ ⊢ (e1, e2) : τ ⊗<sup>p</sup> σ ⊗I ∆ ⊢ e<sup>1</sup> : τ ⊗<sup>p</sup> σ Γ([x : τ ]<sup>s</sup> ,<sup>p</sup> [y : σ]s) ⊢ e<sup>2</sup> : ρ Γ(s∆) ⊢ let (x, y) = e<sup>1</sup> in e<sup>2</sup> : ρ ⊗E Γ ⊢ e : τ Γ ⊢ inj<sup>1</sup> e : τ ⊕ σ ⊕1I Γ ⊢ e : σ Γ ⊢ inj<sup>2</sup> e : τ ⊕ σ ⊕2I Γ ⊢ e<sup>1</sup> : τ ⊕ σ ∆([x : τ ]s) ⊢ e<sup>2</sup> : ρ ∆([y : σ]s) ⊢ e<sup>3</sup> : ρ ∆(sΓ) ⊢ case e<sup>1</sup> of x. e<sup>2</sup> | y. e<sup>3</sup> : ρ ⊕E Γ ⊢ e : τ sΓ ⊢ !e : !sτ !I Γ ⊢ e<sup>1</sup> : !rτ ∆([x : τ ]rs) ⊢ e<sup>2</sup> : σ ∆(sΓ) ⊢ let !x = e<sup>1</sup> in e<sup>2</sup> : σ !E Γ(∆ ,<sup>p</sup> ∆ ′ ) ⊢ e : τ ∆ ≈ ∆ ′ Γ(Contr(p, ∆, ∆′ )) ⊢ e[vars(∆ ′ )/vars(∆)] : τ Contr Γ(·) ⊢ e : τ Γ(∆) ⊢ e : τ Weak Γ ⊢ e : τ Γ ↭ Γ ′ Γ ′ ⊢ e : τ Exch

Γ ≈ ∆ Γ ⊢ e<sup>1</sup> : ⃝<sup>P</sup> τ ∆,<sup>p</sup> [x : τ ]<sup>s</sup> ⊢ e<sup>2</sup> : ⃝<sup>P</sup> σ Contr(1, Γ, ∆) ⊢ mlet x = e<sup>1</sup> in e<sup>2</sup> : ⃝<sup>P</sup> σ Bind-P Γ ⊢ e : τ ∞Γ ⊢ return e : ⃝<sup>P</sup> τ Return-P

Γ ≈ ∆ Γ ⊢ e<sup>1</sup> : ⃝Hτ ∆,<sup>p</sup> [x : τ ]<sup>s</sup> ⊢ e<sup>2</sup> : ⃝Hσ Contr(2, Γ, ∆) ⊢ mlet x = e<sup>1</sup> in e<sup>2</sup> : ⃝Hσ Bind-H Γ ⊢ e : τ ∞Γ ⊢ return e : ⃝Hτ Return-H

Fig. 3. Bunched Fuzz typing rules

renaming all variables to fresh ones. Then we have Γ ≈ Γ ′ and thus

$$\begin{array}{cc} \Gamma \vdash e\_1 : \sigma\_1 & \Gamma \approx \Gamma'\\ \hline \Gamma, \infty \vdash e\_2 \left[ \begin{matrix} \operatorname{vars}(\Gamma')/\operatorname{vars}(\Gamma) \end{matrix} \right] : \sigma\_2\\ \hline \Gamma, \infty \vdash (e\_1, e\_2[\operatorname{vars}(\Gamma')/\operatorname{vars}(\Gamma)]) : \sigma\_1 \otimes\_{\infty} \sigma\_2\\ \hline \operatorname{Contr}(\infty, \Gamma, \Gamma') \vdash (e\_1, e\_2) : \sigma\_1 \otimes\_{\infty} \sigma\_2 \end{array} \text{@ $\Gamma$ } $$

Note that we have defned <sup>∞</sup>√ x<sup>∞</sup> + y<sup>∞</sup> = max(x, y) by taking the limit of √<sup>p</sup> x <sup>p</sup> + y <sup>p</sup> when <sup>p</sup> goes to infnity, and thus we have Contr(∞, Γ, Γ′ ) = Γ. Therefore the pairing rule of & is derivable for ⊗∞.

# 4 Semantics

Having defned the syntax of Bunched Fuzz and its type system, we are ready to present its semantics. We opt for a denotational formulation, where types τ and bunches <sup>Γ</sup> are interpreted as metric spaces <sup>J</sup><sup>τ</sup> <sup>K</sup> and <sup>J</sup>ΓK, and a derivation <sup>π</sup> of <sup>Γ</sup> <sup>⊢</sup> <sup>e</sup> : <sup>τ</sup> is interpreted as a non-expansive function <sup>J</sup>π<sup>K</sup> : <sup>J</sup>Γ<sup>K</sup> <sup>→</sup> <sup>J</sup><sup>τ</sup> <sup>K</sup>. For space reasons, we do not provide an operational semantics for the language, but we foresee no major difculties in doing so, since the term language is mostly inherited from Fuzz, which does have a denotational semantics proved sound with respect to an operational semantics [4].

Types Each type <sup>τ</sup> is interpreted as a metric space <sup>J</sup><sup>τ</sup> <sup>K</sup> in a compositional fashion, by mapping each type constructor to the corresponding operation on metric spaces defned in Figure 4. We now explain these defnitions.

The operations of the frst four lines of Figure 4 come from prior work on Fuzz [4,3]. The defnition of ⊗<sup>p</sup> uses as carrier set the cartesian product, just as ⊗ in previous works, but endows it with the L <sup>p</sup> distance, defned in Section 2.4. In the particular case of p = 1, ⊗<sup>1</sup> is the same as ⊗.

As for ⊸p, we want to defne it in such a way that currying and uncurrying work with respect to ⊗p, which will allow us to justify the introduction and elimination forms for that connective. For that we frst choose as carrier set the set A ⊸ B of non-expansive functions from A to B. This set carries the metric

$$\begin{aligned} d\_{A \to\_p B}(f, g) \\ = \inf \{ r \in \mathbb{R}\_{\infty}^{\ge 0} \mid \forall x, y \in A, d\_B(f(x), g(y)) \le \sqrt[p]{r^p + d\_A(x, y)^p} \} \end{aligned} \tag{11}$$

This metric is dictated by the type of the application operator in the L <sup>p</sup> norm: (<sup>A</sup> <sup>⊸</sup><sup>p</sup> <sup>B</sup>)⊗p<sup>A</sup> <sup>⊸</sup> <sup>B</sup>. Intuitively, if <sup>f</sup> and <sup>g</sup> are at distance <sup>r</sup>, and we want application to be non-expansive, we need to satisfy dB(f(x), g(y)) ≤ pp r <sup>p</sup> + dA(x, y) p for every x, y ∈ A. The above defnition says that we pick the distance to be the smallest possible r that makes this work. Note that this choice is forced upon us: in category-theoretic jargon, the operations of currying and uncurrying, which are intimately tied to the application operator, correspond to an adjunction between two functors, which implies that any other metric space that yields a similar adjunction with respect to <sup>⊗</sup><sup>p</sup> must be isomorphic to <sup>⊸</sup>p. In particular, this implies that its metric will be the same as the one of ⊸p.

For ⃝<sup>P</sup> A and ⃝HA the carrier set is the set DA of discrete distributions over A. As to the metric on the carrier set, the interpretation of ⃝<sup>P</sup> uses the max divergence, used in the defnition of diferential privacy (see Sect. 2.2). The interpretation of ⃝<sup>H</sup> uses instead the Hellinger distance (see e.g. [3]):

$$\mathsf{HD}\_{A}(\mu,\nu) \triangleq \sqrt{\frac{1}{2} \sum\_{x \in A} |\sqrt{\mu(x)} - \sqrt{\nu(x)}|^{2}} \tag{12}$$


Fig. 4. Operations on metric spaces for interpreting types

Bunches The interpretation of bunches is similar to that of types. Variables correspond to scaled metric spaces, whereas ,<sup>p</sup> corresponds to ⊗p:

$$\begin{array}{cc} \begin{bmatrix} \cdot\\ \end{bmatrix} = 1 & \begin{bmatrix} [x:\tau]\_s \end{bmatrix} = !\_s \begin{bmatrix} \tau \end{bmatrix} & \begin{bmatrix} \begin{bmatrix} \tau \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \Gamma\_1 \end{bmatrix} \otimes\_p \begin{bmatrix} \Gamma\_2 \end{bmatrix} . \end{array}$$

One complication compared to prior designs is the use of an explicit exchange rule, which is required to handle the richer structure of contexts. Semantically, each use of exchange induces an isomorphism of metric spaces:

Theorem 3. Each derivation of Γ ↭ ∆ corresponds to an isomorphism of metric spaces <sup>J</sup>Γ<sup>K</sup> <sup>∼</sup><sup>=</sup> <sup>J</sup>∆K.

Before stating the interpretation of typing derivations, we give an overview of important properties of the above constructions that will help us prove the soundness of the interpretation.

Scaling Much like in prior work [4,3], we can check the following equations:

#### Proposition 4.

$$l\_{s\_1}!\_{s\_2}A = !\_{s\_1 \cdot s\_2}A \qquad l\_s(A \oplus B) = !\_s A \oplus !\_s B \qquad l\_s(A \otimes\_p B) = !\_s A \otimes\_p !\_s B \dots$$

Moreover, an s-sensitive function from A to B is the same thing as a nonexpansive function of type !sA ⊸ B.

Proposition 5. For every bunch <sup>Γ</sup>, we have <sup>J</sup>sΓ<sup>K</sup> = !sJΓK.

Tensors The properties on L <sup>p</sup> distances allow us to relate product types with diferent values of p.

# Proposition 6. [Subtyping of tensors]

1. Let <sup>A</sup>, <sup>B</sup> be two metric spaces and p, q <sup>∈</sup> <sup>R</sup> ≥1 <sup>∞</sup> with p ≤ q. Then the identity map on pairs belongs to the two following spaces:

$$A \otimes\_p B \multimap A \otimes\_q B \qquad \qquad !\_{2^{1/p - 1/q}} (A \otimes\_q B) \multimap A \otimes\_p B.$$

2. In particular, when p = 1 and q = 2, the identity map belongs to:

$$\begin{array}{ccc} A \otimes\_1 B \multimap A \otimes\_2 B & \end{array} \qquad \begin{array}{ccc} !\!/\_{\sqrt{2}}(A \otimes\_2 B) \multimap A \otimes\_1 B \end{array}$$

Proof. For (1), the fact that the identity belongs to the frst space follows from the fact that dq(x, y) ≤ dp(x, y), by Proposition 3 (Equation (9)). The second claim is derived from Proposition 3 (Equation (9)) in the case n = 2.

Remark 1. Proposition 6 allows us to relate diferent spaces of functions with multiple arguments. For example,

$$(A \otimes\_2 B \multimap C) \subseteq (A \otimes\_1 B \multimap C) \quad (A \otimes\_1 B \multimap C) \subseteq (\![ \![\_{\sqrt{2}}(A \otimes\_2 B) \multimap C) \text{-} \![ \![ \![ \![{}^{\sqrt{2}}(A \otimes\_{\sqrt{2}} B) \cap \text{C}) \text{-} ] ] \![ \![ \![{}^{\sqrt{2}}(A \otimes\_{\sqrt{2}} B) \cap \text{C}) ] ] \!])$$

Bunched Fuzz does not currently exploit these inclusions in any signifcant way, but we could envision extending the system with a notion of subtyping to further simplify the use of multiple product metrics in a single program.

We also have the following result, which is instrumental to prove the soundness of the contraction rule.

Proposition 7. Let X, Y, Z, W be metric spaces, and p, q <sup>∈</sup> <sup>R</sup> ≥1 <sup>∞</sup> with p ̸= ∞. The canonical isomorphism of sets (X × Y ) × (Z × W) ∼= (X × Z) × (Y × W), which swaps the second and third components, is a non-expansive function of type !c(p,q)((X ⊗<sup>q</sup> Y ) ⊗<sup>p</sup> (Z ⊗<sup>q</sup> W)) → (X ⊗<sup>p</sup> Z) ⊗<sup>q</sup> (Y ⊗<sup>p</sup> W), where c(p, q) is defned as in Figure 2.

Proof. First, suppose that p ≤ q. Then we can write the isomorphism as a composite of the following non-expansive functions:

$$\begin{array}{ll} \mathsf{l}\_{c(p,q)}((X\otimes\_{q}Y)\otimes\_{p}(Z\otimes\_{q}W) \\ \to \mathsf{l}\_{c(p,q)}((X\otimes\_{q}Y)\otimes\_{q}(Z\otimes\_{q}W)) & \text{Proposition 6} \\ \cong \mathsf{l}\_{c(p,q)}((X\otimes\_{q}Z)\otimes\_{q}(Y\otimes\_{q}W)) & \text{assoc., com. of }\otimes\_{q} \\ =\mathsf{l}\_{c(p,q)}(X\otimes\_{q}Z)\otimes\_{q}\mathsf{l}\_{c(p,q)}(Y\otimes\_{q}W) & \text{Proposition 4} \\ =(X\otimes\_{p}Z)\otimes\_{q}(Y\otimes\_{p}W) & \text{Proposition 6.} \end{array}$$

Otherwise, p > q, and we reason as follows.

$$\begin{array}{ll} \mathfrak{l}\_{c(p,q)}((X\otimes\_{q}Y)\otimes\_{p}(Z\otimes\_{q}W) \\ \to \mathfrak{l}\_{c(p,q)}((X\otimes\_{p}Y)\otimes\_{q}(Z\otimes\_{p}W)) & \text{Proposition 6} \\ \cong \mathfrak{l}\_{c(p,q)}((X\otimes\_{p}Z)\otimes\_{p}(Y\otimes\_{p}W)) & \text{assoc., com. of }\otimes\_{p} \\ = (X\otimes\_{p}Z)\otimes\_{q}(Y\otimes\_{p}W) & \text{Proposition 6.} \end{array}$$

One can then prove the following property:

Proposition 8. Suppose that we have two bunches Γ ≈ ∆. The carrier sets of <sup>J</sup>Γ<sup>K</sup> and <sup>J</sup>∆<sup>K</sup> are the same. Moreover, for any <sup>p</sup>, the diagonal function <sup>δ</sup>(x) = (x, x) is a non-expansive function of type <sup>J</sup>Contr(p, Γ, ∆)<sup>K</sup> <sup>→</sup> <sup>J</sup>Γ<sup>K</sup> <sup>⊗</sup><sup>p</sup> <sup>J</sup>∆K.

Function Types The metric on ⊸<sup>p</sup> can be justifed by the following result:

Proposition 9. For every metric space <sup>X</sup> and every <sup>p</sup> <sup>∈</sup> <sup>R</sup> ≥1 <sup>∞</sup> , there is an adjunction of type (−)⊗p<sup>X</sup> <sup>⊣</sup> <sup>X</sup> <sup>⊸</sup><sup>p</sup> (−) in Met given by currying and uncurrying. (Both constructions on metric spaces are extended to endofunctors on Met in the obvious way.)

Because right adjoints are unique up to isomorphism, this defnition is a direct generalization of the metric on functions used in Fuzz [23,4,3], which corresponds to ⊸1.

Theorem 4. Suppose that A and B are proper metric spaces, and let f, g : A → B be non-expansive. Then dA⊸1<sup>B</sup>(f, g) = sup<sup>x</sup> dB(f(x), g(x)).

We conclude with another subtyping result involving function spaces.

Theorem 5. For all non-expansive functions f, g ∈ A → B and p ≥ 1, we have dA⊸1<sup>B</sup>(f, g) ≤ dA⊸p<sup>B</sup>(f, g). In particular, the identity function is a nonexpansive function of type (<sup>A</sup> <sup>⊸</sup><sup>p</sup> <sup>B</sup>) <sup>→</sup> (<sup>A</sup> <sup>⊸</sup><sup>1</sup> <sup>B</sup>).

Probability Distributions Prior work [3] proves that the return and bind operations on probability distributions can be seen as non-expansive functions:

$$\eta: !\_{\infty}A \to \bigcirc\_{P} A$$

$$(-)^{\dagger}(-): (!\_{\infty}A \multimap\_{1} \bigcirc\_{P} B) \otimes\_{1} \bigcirc\_{P} A \to \bigcirc\_{P} B.A$$

These properties ensure the soundness of the typing rules for ⃝<sup>P</sup> in Fuzz, and also in Bunched Fuzz. For ⃝H, we can use the following composition principle.

Theorem 6. The following types are sound for the monadic operations on distributions, seen as non-expansive operations, for any p ≥ 1:

$$\eta: !\_{\infty}A \to \bigcirc\_{H} A$$

$$(-)^{\dagger}(-): (!\_{\infty}A \multimap\_{p} \bigcirc\_{H} B) \otimes\_{2} \bigcirc\_{H} A \to \bigcirc\_{H} B.$$

Derivations Finally, a derivation tree builds a function from the context's space to the subject's space. In the following defnition, we use the metavariables γ and δ to denote variable assignments—that is, mappings from the variables of environments Γ and ∆ to elements of the corresponding metric spaces. We use <sup>γ</sup>(δ) to represent an assignment in <sup>J</sup>Γ(∆)<sup>K</sup> that is decomposed into two assignments γ(⋆) and δ corresponding to the Γ(⋆) and ∆ portions. Finally, we use the λ-calculus notation f x to denote a function f being applied to the value x.

Defnition 1. Given a derivation <sup>π</sup> proving <sup>Γ</sup> <sup>⊢</sup> <sup>e</sup> : <sup>τ</sup> , its interpretation <sup>J</sup>π<sup>K</sup> <sup>∈</sup> <sup>J</sup>Γ<sup>K</sup> <sup>→</sup> <sup>J</sup><sup>τ</sup> <sup>K</sup> is given by structural induction on <sup>π</sup> as follows:

<sup>J</sup>Axiom<sup>K</sup> <sup>≜</sup> λx. x <sup>J</sup>RI<sup>K</sup> <sup>≜</sup> <sup>λ</sup>(). r <sup>∈</sup> <sup>R</sup> <sup>J</sup><sup>⊸</sup> I π<sup>K</sup> <sup>≜</sup> λγ. λx. <sup>J</sup>π<sup>K</sup> (γ, x) <sup>J</sup><sup>⊸</sup> E π<sup>1</sup> <sup>π</sup>2<sup>K</sup> <sup>≜</sup> <sup>λ</sup>(γ, δ). <sup>J</sup>π2<sup>K</sup> <sup>γ</sup> (Jπ1<sup>K</sup> <sup>δ</sup>) <sup>J</sup>1I<sup>K</sup> <sup>≜</sup> <sup>λ</sup>(). () <sup>J</sup>⊗I π<sup>1</sup> <sup>π</sup>2<sup>K</sup> <sup>≜</sup> <sup>λ</sup>(γ, δ). (Jπ1<sup>K</sup> <sup>γ</sup>),(Jπ2<sup>K</sup> <sup>δ</sup>) <sup>J</sup>⊗E π<sup>1</sup> <sup>π</sup>2<sup>K</sup> <sup>≜</sup> λγ(δ). <sup>J</sup>π2<sup>K</sup> <sup>γ</sup>(Jπ1Kδ) <sup>J</sup>⊕iI π<sup>K</sup> <sup>≜</sup> λγ. inj<sup>i</sup> <sup>J</sup>π<sup>K</sup> <sup>γ</sup> <sup>J</sup>⊕E π<sup>1</sup> <sup>π</sup>2<sup>K</sup> <sup>≜</sup> λδ(γ). [Jπ2K, <sup>J</sup>π3K](δ(Jπ1Kγ)) <sup>J</sup>!I π<sup>K</sup> <sup>≜</sup> <sup>J</sup>πK J!E π<sup>1</sup> <sup>π</sup>2<sup>K</sup> <sup>≜</sup> λ δ(γ). <sup>J</sup>π2<sup>K</sup> <sup>δ</sup>(Jπ1<sup>K</sup> <sup>γ</sup>) <sup>J</sup>Contr π<sup>K</sup> <sup>≜</sup> λγ(δ). <sup>J</sup>π<sup>K</sup> <sup>γ</sup>(δ, δ) <sup>J</sup>W eak π<sup>K</sup> <sup>≜</sup> λγ(δ). <sup>J</sup>π<sup>K</sup> <sup>γ</sup>( () ) <sup>J</sup>Exch π<sup>K</sup> <sup>≜</sup> λγ′ .JπKϕγ′/γ(<sup>γ</sup> ′ ) <sup>J</sup>Bind-P <sup>π</sup><sup>1</sup> <sup>π</sup>2<sup>K</sup> <sup>≜</sup> λγ′ . (Jπ2K<sup>γ</sup> ′ ) † (Jπ1K<sup>γ</sup> ′ ) <sup>J</sup>Return-P <sup>π</sup><sup>K</sup> <sup>≜</sup> λγ. η(Jπ<sup>K</sup> <sup>γ</sup>)

where in <sup>J</sup>Exch πK, the map <sup>ϕ</sup>Γ′/Γ is the isomorphism defned by Theorem 3. and for the two last cases see defnitions in equations (3) and (4) (Bind-H and Return-H are defned in the same way).

Theorem 7 (Soundness). Given a derivation <sup>π</sup> proving <sup>Γ</sup> <sup>⊢</sup> <sup>e</sup> : <sup>τ</sup> , then <sup>J</sup>π<sup>K</sup> is a non-expansive function from the space <sup>J</sup>Γ<sup>K</sup> to the space <sup>J</sup><sup>τ</sup> <sup>K</sup>.

# 5 Examples

We now look at examples of programs that illustrate the use of L <sup>p</sup> metrics.

Currying and Uncurrying Let us illustrate the use of higher-order functions with combinators for currying and uncurrying.

$$\begin{aligned} & \operatorname{currry} : ( (\tau \otimes\_p \sigma) \multimap\_p \rho ) \multimap (\tau \multimap\_p \sigma \multimap\_p \rho) \\ & \operatorname{currry} \multimap f \ x \ y = f(x, y) \\ & \operatorname{currry} : (\tau \multimap\_p \sigma \multimap\_p \rho) \multimap ((\tau \otimes\_p \sigma) \multimap\_p \rho). \end{aligned}$$
  $\begin{aligned} & \operatorname{currry} : (\tau \multimap\_p \sigma \multimap\_p \sigma) \multimap\_p \rho).$ 

Note that the indices on <sup>⊗</sup> and <sup>⊸</sup> need to be the same. The reason can be traced back to the ⊸ E rule (cf. Figure 3), which uses the ,<sup>p</sup> connective to eliminate ⊸<sup>p</sup> (cf. the currying and uncurrying derivation in the appendix of the full paper for a detailed derivation). If the indices do not agree, currying is not possible; in other words, we cannot in general soundly curry a function of type <sup>τ</sup> <sup>⊗</sup><sup>p</sup> <sup>σ</sup> <sup>⊸</sup><sup>q</sup> <sup>ρ</sup> to obtain something of type <sup>τ</sup> <sup>⊸</sup><sup>p</sup> <sup>σ</sup> <sup>⊸</sup><sup>q</sup> <sup>ρ</sup>. However, if <sup>q</sup> <sup>≤</sup> <sup>p</sup>, note that it would be possible to soundly view τ ⊗qσ as a subtype of τ ⊗<sup>p</sup> σ, thanks to Proposition 6. In this case, we could then convert from <sup>τ</sup> <sup>⊗</sup><sup>p</sup> <sup>σ</sup> <sup>⊸</sup><sup>q</sup> <sup>ρ</sup> to <sup>τ</sup> <sup>⊗</sup><sup>q</sup> <sup>σ</sup> <sup>⊸</sup><sup>q</sup> <sup>ρ</sup> (note the variance), and then curry to obtain a function of type τ ⊸<sup>q</sup> σ ⊸<sup>q</sup> ρ.

Precise sensitivity for functions with multiple arguments Another useful feature of Bunched Fuzz is that its contraction rule allows us to split sensitivities more accurately than if we used the contraction rule that is derivable in the original Fuzz. Concretely, suppose that we have a program λp.let (x, y) = p in f(x, y)+ <sup>g</sup>(x, y), where <sup>f</sup> and <sup>g</sup> have types <sup>f</sup> : (!2R) <sup>⊗</sup><sup>2</sup> <sup>R</sup> <sup>⊸</sup> <sup>R</sup> and <sup>g</sup> : <sup>R</sup> <sup>⊗</sup><sup>2</sup> (!2R) <sup>⊸</sup> <sup>R</sup>, and where we have elided the wrapping and unwrapping of ! types, for simplicity.

Let us sketch how this program is typed in Bunched Fuzz. Addition belongs to <sup>R</sup>⊗<sup>1</sup> <sup>R</sup> <sup>⊸</sup> <sup>R</sup>, so by Proposition <sup>6</sup> it can also be given the type !<sup>√</sup> 2 (R⊗<sup>2</sup> <sup>R</sup>) <sup>⊸</sup> <sup>R</sup>. Thus, we can build the following derivation for the body of the program:

$$\text{ContrR} \xrightarrow[]{\Gamma \vdash f(x\_1, y\_1) + g(x\_2, y\_2) : \mathbb{R}} \mathbb{R}$$

where Γ = ([x<sup>1</sup> : R] 2 √ 2 ,<sup>2</sup> [y<sup>1</sup> : R]<sup>√</sup> 2 ),<sup>2</sup> ([x<sup>2</sup> : R]<sup>√</sup> 2 ,<sup>2</sup> [y<sup>2</sup> : R] 2 √ 2 ), and where we used contraction twice to merge the <sup>x</sup>s and <sup>y</sup>s. Note that ||(2<sup>√</sup> 2, √ √ 2)||<sup>2</sup> = 8 + 2 = <sup>√</sup> 10, which is why the fnal sensitivities have this form. By contrast, consider how we might attempt to type this program directly in the original Fuzz. Let us assume that we are working in an extension of Fuzz with types for expressing the domains of f and g, similarly to the L <sup>2</sup> vector types of Duet [20]. Moreover, let us assume that we have coercion functions that allow us to cast from (!2R)⊗<sup>2</sup> (!2R) to (!2R)⊗<sup>2</sup> <sup>R</sup> and <sup>R</sup>⊗<sup>2</sup> (!2R). If we have a pair <sup>p</sup> :!2((!2R)⊗<sup>2</sup> (!2R)), we can split its sensitivity to call f and g and then combine their results with addition. However, this type is equivalent to !4(<sup>R</sup> <sup>⊗</sup><sup>2</sup> <sup>R</sup>), which means that the program was given a worse sensitivity (since <sup>√</sup> 10 < 4). Of course, it would also have been possible to extend Fuzz with a series of primitives that implement precisely the management of sensitivities performed by bunches. However, here this low-level reasoning is handled directly by the type system.

Programming with matrices The Duet language [20] provides several matrix types with the L 1 , L 2 , or L<sup>∞</sup> metrics, along with primitive functions for manipulating them. In Bunched Fuzz, these types can be defned directly as follows: <sup>M</sup>p[m, n] = <sup>⊗</sup><sup>m</sup> <sup>1</sup> <sup>⊗</sup><sup>n</sup> <sup>p</sup> R. Following Duet, we use the L <sup>1</sup> distance to combine the rows and the L <sup>p</sup> distance to combine the columns. One advantage of having types for matrices defned in terms of more basic constructs is that we can program functions for manipulating them directly, without resorting to separate

primitives. For example, we can defne the following terms in the language:

$$\begin{aligned} &addrow: \mathbb{M}\_p[1, n] \otimes\_1 \mathbb{M}\_p[m, n] \multimap \mathbb{M}\_p[m+1, n] \\ &addcolumn: \mathbb{M}\_1[1, m] \otimes\_1 \mathbb{M}\_1[m, n] \multimap \mathbb{M}\_1[m, n+1] \\ &addation: \mathbb{M}\_1[m, n] \otimes\_1 \mathbb{M}\_1[m, n] \multimap \mathbb{M}\_1[m, n]. \end{aligned}$$

The frst program, addrow, appends a vector, represented as a 1 × n matrix, to the frst row of a m × n matrix. The second program, addcolumn, is similar, but appends the vector as a column rather than a row. Because of that, it is restricted to L <sup>1</sup> matrices. Finally, the last program, addition, adds the elements of two matrices pointwise.

Vector addition over sets Let us now show an example of a Fuzz term for which using L <sup>p</sup> metrics allows to obtain a fner sensitivity analysis. We consider sets of vectors in R <sup>d</sup> and the function vectorSum which, given such a set, returns the vectorial sum of its elements. In Fuzz, this function can be defned via a summation primitive sum : !∞(!∞τ ⊸ R) ⊸ set τ ⊸ R, which adds up the results of applying a function to each element of a set [23]. The defnition is:

$$\begin{aligned} \textit{vector} & \textit{Sum}: \textit{!}\_{d} \,\textit{set}(\otimes\_{1}^{d}\mathbb{R}) \multimap\_{1} \otimes\_{1}^{d}\mathbb{R} \\ \textit{vector} & \textit{Sum} \,\textit{s} = (\textit{sum} \,\pi\_{1} \,\textit{s}, \ldots, \textit{sum} \,\pi\_{d} \,\textit{s}). \end{aligned}$$

Here, π<sup>i</sup> : <sup>⊗</sup><sup>d</sup> <sup>1</sup>R ⊸ R denotes the i-th projection, which can be defned by destructing a product. Set types in Fuzz are equipped with the Hamming metric [23], where the distance between two sets is the number of elements by which they difer. Note that, to ensure that sum has bounded sensitivity, we need to clip the results of its function argument to the interval [−1, 1]. Fuzz infers a sensitivity of d for this function because its argument is used with sensitivity 1 in each component of the tuple. In Bunched Fuzz, we can defne the same function as above, but we also have the option of using a diferent L <sup>p</sup> distance to defne vectorSum, which leads to the type !d1/p set(⊗<sup>d</sup> <sup>p</sup>R) <sup>⊸</sup> <sup>⊗</sup><sup>d</sup> <sup>p</sup>R, with a sensitivity of d <sup>1</sup>/p. For the sake of readability, we'll show how this term is typed in the case d = 2. By typing each term (sum π<sup>i</sup> zi) and applying (⊗I) we get:

$$[z\_1 : \operatorname{set}(\mathbb{R} \otimes\_p \mathbb{R})]\_1 , {}\_p[z\_2 : \operatorname{set}(\mathbb{R} \otimes\_p \mathbb{R})]\_1 \vdash (\operatorname{sim} \pi\_1 \ z\_1 , \operatorname{sim} \pi\_2 \ z\_2) : \mathbb{R} \otimes\_p \mathbb{R}.$$

By applying contraction we get: [<sup>z</sup> : set(<sup>R</sup> <sup>⊗</sup><sup>p</sup> <sup>R</sup>)]<sup>2</sup> <sup>1</sup>/p ⊢ (sum π<sup>1</sup> z, sum π<sup>2</sup> z) : <sup>R</sup> <sup>⊗</sup><sup>p</sup> <sup>R</sup>. The claimed type is fnally obtained by (!E) and (<sup>⊸</sup> <sup>I</sup>).

Computing distances Suppose that the type X denotes a proper metric space (that is, where the triangle inequality holds). Then we can incorporate its distance function in Bunched Fuzz with the type <sup>X</sup> <sup>⊗</sup><sup>1</sup> <sup>X</sup> <sup>⊸</sup> <sup>R</sup>. Indeed, let <sup>x</sup>, <sup>x</sup> ′ , y and y ′ be arbitrary elements of X. Then

$$\begin{aligned} d\_X(x,y) - d\_X(x',y') &\le d\_X(x,x') + d\_X(x',y') + d\_X(y',y) - d\_X(x',y') \\ &= d\_X(x,x') + d\_X(y,y') = d\_1((x,y),(x',y')). \end{aligned}$$

By symmetry, we also know that dX(x ′ , y′ )−dX(x, y) ≤ d1((x, y),(x ′ , y′ )). Combined, these two facts show

$$d\_{\mathbb{R}}(d\_X(x,y), d\_X(x',y')) = |d\_X(x,y) - d\_X(x',y')| \le d\_1((x,y),(x',y')),$$

which proves that d<sup>X</sup> is indeed a non-expansive function.

Calibrating noise to L <sup>p</sup>distance Hardt and Talwar [17] have proposed a generalization of the Laplace mechanism, called the K-norm mechanism, to create a diferentially private variant of a database query <sup>f</sup> : db <sup>→</sup> <sup>R</sup> d . The diference is that the amount of noise added is calibrated to the sensitivity of f measured with the K norm, as opposed to the L <sup>1</sup> distance used in the original Laplace mechanism. When K corresponds to the L <sup>p</sup> norm, we will call this the L p -mechanism, following Awan and Slavkovich [1].

Defnition 2. Given <sup>f</sup> : db <sup>→</sup> <sup>R</sup> <sup>d</sup> with L p sensitivity s and ϵ > 0, the L p mechanism is a mechanism that, given a database D ∈ db, returns a probability distribution over <sup>y</sup> <sup>∈</sup> <sup>R</sup> <sup>d</sup> with density given by:

$$\frac{\exp(\frac{-\epsilon||f(D)-y||\_p}{2s})}{\int \exp(\frac{-\epsilon||f(D)-y||\_p}{2s}) dy}$$

This mechanism returns with high probability (which depends on ϵ and on the sensitivity <sup>s</sup>) a vector <sup>y</sup> <sup>∈</sup> <sup>R</sup> <sup>d</sup> which is close to f(D) in L <sup>p</sup> distance. Such a mechanism can be easily integrated in Bunched Fuzz through a primitive:

$$\mathsf{LpMech} : !\_{\infty} (!\_{s} \mathsf{dB} \dashrightarrow \otimes\_{p}^{d} \mathbb{R}) \dashrightarrow !\_{\epsilon} \mathsf{dB} \dashrightarrow \bigcirc\_{P} (\otimes\_{p}^{d} \mathbb{R}) \dashrightarrow$$

(Strictly speaking, we would need some discretized version of the above distribution to incorporate the mechanism in Bunched Fuzz, but we'll ignore this issue in what follows.) The fact that LpMech satisfes ϵ-diferential privacy follows from the fact that this mechanism is an instance of the exponential mechanism [18], a basic building block of diferential privacy. It is based on a scoring function assigning a score to every pair consisting of a database and a potential output, and it attempts to return an output with approximately maximal score, given the input database. As shown by Gaboardi et al. [13], the exponential mechanism can be added as a primitive to Fuzz with type:

$$\mathtt{\mathfrak{a}\mathtt{exp}}\mathtt{m}\mathtt{c}\mathtt{h} : !\_{\infty} \mathit{set}(\mathcal{O}) \ - \circlearrowright \mathtt{l}\_{\infty}(!\_{\infty}\mathcal{O} \ - \circlearrowright \mathtt{a}\mathtt{B} \ - \circlearrowright \mathtt{R}) \ - \circlearrowright \mathtt{d}\mathtt{B} \ - \circlearrowright \mathtt{C}\mathtt{P}\mathcal{O},$$

where O is the type of outputs. The function LpMech is an instance of the exponential mechanism where <sup>O</sup> is <sup>⊗</sup><sup>d</sup> <sup>p</sup><sup>R</sup> and the score is λyλD.||f(D) <sup>−</sup> <sup>y</sup>||p.

To defne the L <sup>p</sup> mechanism with this recipe, we need to reason about the sensitivity of this scoring function. In Fuzz, this would not be possible, since the language does not support reasoning about the sensitivity of f measured in the L <sup>p</sup> distance. In Bunched Fuzz, however, this can be done easily. Below, we will see an example (Gradient descent) of how the L <sup>p</sup> mechanism can lead to a fner privacy guarantee.

Gradient descent Let us now give an example where we use the L <sup>p</sup> mechanism. An example of diferentially private gradient descent example with linear model in Fuzz was given in [25] (see Sect. 4.1, 4.2 and Fig. 6 p. 16, Fig. 8 p.19). This algorithm proceeds by iteration. Actually it was given for an extended language called Adaptative Fuzz, but the code already gives an algorithm in (plain) Fuzz. We refer the reader to this reference for the description of all functions, and here we will only describe how one can adapt the algorithm to Bunched Fuzz.

Given a set of <sup>n</sup> records <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup> d , each with a label <sup>y</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>, the goal is to fnd a parameter vector <sup>θ</sup> <sup>∈</sup> <sup>R</sup> d that minimizes the diference between the labels and their estimates, where the estimate of a label y<sup>i</sup> is the inner product ⟨x<sup>i</sup> , θ⟩. That is, the goal is to minimize the loss function L(θ,(x, y)) = <sup>1</sup> n · <sup>Σ</sup><sup>n</sup> <sup>i</sup>=1(⟨x<sup>i</sup> , θ⟩ − yi) 2 . The algorithm starts with an initial parameter vector (0, . . . , 0) and it iteratively produces successive θ vectors until a termination condition is reached.

The Fuzz program uses the data-type bag τ representing bags or multisets over τ . A bagmap primitive is given for it. The type I is the unit interval [0, 1]. The main function is called updateP arameter and updates one component of the model θ; it is computed in the following way:


We modify the program as follows to check it in Bunched Fuzz and use the L p -mechanism. Instead of computing over <sup>R</sup> we want to compute over <sup>⊗</sup><sup>d</sup> <sup>p</sup>R for a given <sup>p</sup> <sup>≥</sup> 1, so <sup>R</sup> d equipped with L <sup>p</sup> distance. The records <sup>x</sup><sup>i</sup> are in <sup>⊗</sup><sup>d</sup> p I and the labels y<sup>i</sup> in <sup>I</sup>. The database type is dB <sup>=</sup> bag (<sup>I</sup> <sup>⊗</sup><sup>p</sup> (⊗<sup>d</sup> p I)). The distance between two bags in dB is the number of elements by which they difer.

We assume a primitive bagV ectorSum with type !d1/p bag (⊗<sup>d</sup> p <sup>I</sup>) <sup>⊸</sup> <sup>⊗</sup><sup>d</sup> <sup>p</sup>R (it could be defned as the vectorSum defned above for sets, using a sum primitive for bags). Given a bag m, (bagV ectorSum m) returns the vectorial sum of all elements of m. We can check that the sensitivity of bagV ectorSum is indeed d <sup>1</sup>/p because given two bags m and m′ that are at distance 1, if we denote by u the vector by which they difer, we have:

d(⊗<sup>d</sup> <sup>p</sup>R) (bagV ectorSum(m), bagV ectorSum(m′ )) = ||u||<sup>p</sup> ≤ (Σ d <sup>j</sup>=11)<sup>1</sup>/p = d 1/p

By adapting the calcGrad Fuzz term of [25] using bagV ectorSum we obtain a term V ectcalcGrad with the Bunched Fuzz type !<sup>∞</sup> <sup>⊗</sup><sup>d</sup> <sup>p</sup> <sup>R</sup> <sup>⊸</sup>!d1/p db <sup>⊸</sup> <sup>⊗</sup><sup>d</sup> <sup>p</sup>R.

<sup>9</sup> Actually calcGrad computes (∇L(θ,(x, y)))<sup>j</sup> up to a multiplicative constant, 2/n, which is mutliplied afterwards in the updateP arameter function.

Given a vector θ and a database (y, x), V ectcalcGrad computes the updated vector θ ′ . Finally we defne the term updateV ector by adding noise to V ectcalcGrad using the the L p -mechanism. Recall the type of LpMech: !∞(!sdb <sup>⊸</sup> <sup>⊗</sup><sup>d</sup> <sup>p</sup>R) ⊸ !ϵdb <sup>⊸</sup> ⃝<sup>P</sup> (⊗<sup>d</sup> <sup>p</sup>R). We defne updateV ector and obtain its type as follows:

updateV ector = λθ.(LpMech (V ectcalcGrad θ)) : !<sup>∞</sup> ⊗ d <sup>p</sup> <sup>R</sup> <sup>⊸</sup>!ϵdb <sup>⊸</sup> ⃝<sup>P</sup> (<sup>⊗</sup> d <sup>p</sup>R)

By iterating updateV ector n times one obtains a privacy budget of nϵ.

# 6 Implementation

To experiment with the Bunched Fuzz design, we implemented a prototype for a fragment of the system based on DFuzz [13,2].<sup>10</sup> The type-checker generates a set of numeric constraints that serve as verifcation conditions to guarantee a valid typing. The implementation required adapting some of the current rules to an algorithmic formulation (found in the full version). In addition to the modifcations introduced in the DFuzz type checker compared to its original version [13,2], we also made the following changes and simplifcations:


While, strictly speaking, the resulting system is incomplete with respect to the rules presented here, it is powerful enough to check an implementation of Kmeans that generalizes a previous version implemented for Fuzz [23]. On the other hand, because our implementation is based on the one of DFuzz, which features dependent types, we allow functions that are polymorphic on types, sizes and p parameters, which allows us to infer sensitivity information that depends on run-time sizes.

# 7 Related Work

Bunched Fuzz is inspired by BI, the logic of bunched implications [22], which has two connectives for combining contexts. Categorically, one of these connectives corresponds to a Cartesian product, whereas the other corresponds to a

<sup>10</sup> https://github.com/junewunder/bunched-fuzz

monoidal, or tensor product. While related to linear logic, the presence of the two context connectives allows BI to derive some properties that are not valid in linear logic. For example, the cartesian product does not distribute over sums in linear logic but it does distribute over sums in BI.

We have shown how the rules for such type systems are reminiscent of the ones used in type systems for the calcuclus of bunched implications [21], and for reasoning about categorical grammars [19]. Specifcally, O'Hearn introduces a type system with two products and two arrows [21]. Typing environments are bunches of variable assignments with two constructors, corresponding to the two products. Our work can be seen as a generalization of O'Hearn's work to handle multiple products and to reason about program sensitivity.

Moot and Retor´e [19, Chapter 5] introduce the multimodal Lambek calculus, which extends the non-associative Lambek calculus, a classical tool for describing categorical grammars. This generalization uses an indexed family of connectives and trees to represent environments. The main diferences with our work are: our indexed products are associative and commutative, while theirs are not; our type system is afne; our type system includes a monad for probabilities which does not have a correspondent construction in their logic; our type system also possesses the graded comonad !<sup>s</sup> corresponding to the ! modality of linear logic, the interaction between this comonad and the bunches is non-trivial and it requires us to explicitly defne a notion of contraction. Besides the fact that the main properties we study, metric interpretation and program sensitivity, are very diferent from the ones studied by the above authors, there are some striking similarities between the two systems.

A recent work by Bao et al. [5] introduced a novel bunched logic with indexed products and magic wands with a preorder between the indices. This logic is used as the assertion logic of a separation logic introduced to reason about negative dependence between random variables. The connectives studied in this work share some similarities with the ones we study here and it would be interesting to investigate further the similarities, especially from a model-theoretic perspective.

Because contexts in the original Fuzz type system are biased towards the L 1 distance, it is not obvious how Fuzz could express the composition principles of the Hellinger distance. Recent work showed how this could be amended via a path construction that recasts relational program properties as sensitivity properties [3]. Roughly speaking, instead of working directly with the Hellinger distance dH, the authors consider a family of relations R<sup>α</sup> = {(µ1, µ2) | dH(µ1, µ2) ≤ α}. Such a relation induces another metric on distributions, dα,H, where the distance between two distributions is the length of the shortest path connecting them in the graph corresponding to Rα. This allows them to express the composition principles of the Hellinger distance directly in the Fuzz type system, albeit at a cost: the type constructor for probability distributions is graded by the distance bound α. Thus, the sensitivity information of a randomized algorithm with respect to the Hellinger distance must also be encoded in the codomain of the function, as opposed to using just its domain, as done for the original privacy metric of Fuzz. By contrast, Bunched Fuzz does not require the grading α because it can express the composition principle of the Hellinger distance directly, thanks to the use of the L <sup>2</sup> distance on bunches.

Duet [20] can be seen as an extension of Fuzz to deal with more general privacy distances. It consists of a two-layer language: a sensitivity language and a privacy language. The sensitivity language is very similar to Fuzz. However, it also contains some basic primitives to manage vectors and matrices. As in Fuzz, the vector types come with multiple distances but diferently from Fuzz, Duet also uses the L <sup>2</sup> distance. The main reason for this is that Duet also supports the Gaussian mechanism which calibrates the noise to the L 2 sensitivity of the function. Our work is inspired by this aspect of Duet, but it goes beyond it by giving a logical foundation to L <sup>p</sup> vector distances. Another language inspired by Fuzz is the recently proposed Jazz [24]. Like Duet, this language has two products and primitives tailored to the L 2 sensitivity of functions for the Gaussian mechanism. Interestingly, this language uses contextual information to achieve more precise bounds on the sensitivities. The semantics of Jazz is diferent from the metric semantics we study here; however, it would be interesting to explore whether a similar contextual approach could be also used in a metric setting.

# 8 Conclusion and Future work

In this work we have introduced Bunched Fuzz, a type system for reasoning about program sensitivity in the style of Fuzz [23]. Bunched Fuzz extends the type theory of Fuzz by considering new type constructors for L <sup>p</sup> distances and bunches to manage diferent products in typing environments. We have shown how this type system supports reasoning about both deterministic and probabilistic programs.

There are at least two directions that we would like to explore in future works. On the one hand, we would like to understand if the typing rules we introduced here could be of more general use in the setting of probabilistic programs. We have already discussed the usefulness for other directions in the deterministic case [19]. One way to approach this problem could be by looking at the family of products recently identifed in [5]. These products give a model for a logic to reason about negative dependence between probabilistic variables. It would be interesting to see if the properties of these products match the one we have here.

On the other hand, we would like to understand if Bunched Fuzz can be used to reason about more general examples in diferential privacy. One way to approach this problem could be to consider examples based on the use of Hellinger distance that have been studied in the literature on probabilistic inference [6].

Acknowledgements This material is based upon work supported by the NSF under Grant No. 1845803 and 2040249. The third author was partially supported by the french Program "Investissements d'avenir" (I-ULNE SITE / ANR-16- IDEX-0004 ULNE) managed by the National Research Agency.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Fast and Correct Gradient-Based Optimisation for Probabilistic Programming via Smoothing

Basim Khajwal<sup>1</sup> , C.-H. Luke Ong1,<sup>2</sup> , and Dominik Wagner1()

> <sup>1</sup> University of Oxford, Oxford, UK dominik.wagner@cs.ox.ac.uk <sup>2</sup> NTU, Singapore, Singapore

Abstract. We study the foundations of variational inference, which frames posterior inference as an optimisation problem, for probabilistic programming. The dominant approach for optimisation in practice is stochastic gradient descent. In particular, a variant using the so-called reparameterisation gradient estimator exhibits fast convergence in a traditional statistics setting. Unfortunately, discontinuities, which are readily expressible in programming languages, can compromise the correctness of this approach. We consider a simple (higher-order, probabilistic) programming language with conditionals, and we endow our language with both a measurable and a smoothed (approximate) value semantics. We present type systems which establish technical pre-conditions. Thus we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem. Besides, we can solve the original problem up to any error tolerance by choosing an accuracy coefficient suitably. Empirically we demonstrate that our approach has a similar convergence as a key competitor, but is simpler, faster, and attains orders of magnitude reduction in worknormalised variance.

Keywords: probabilistic programming · variational inference · reparameterisation gradient · value semantics · type systems.

# 1 Introduction

Probabilistic programming is a programming paradigm which has the vision to make statistical methods, in particular Bayesian inference, accessible to a wide audience. This is achieved by a separation of concerns: the domain experts wishing to gain statistical insights focus on modelling, whilst the inference is performed automatically. (In some recent systems [4,9] users can improve efficiency by writing their own inference code.)

In essence, probabilistic programming languages extend more traditional programming languages with constructs such as score or observe (as well as sample ) to define the prior p(z) and likelihood p(x | z). The task of inference is to derive the posterior p(z | x), which is in principle governed by Bayes' law yet usually intractable.

Whilst the paradigm was originally conceived in the context of statistics and Bayesian machine learning, probabilistic programming has in recent years proven to be a very fruitful subject for the programming language community. Researchers have made significant theoretical contributions such as underpinning languages with rigorous (categorical) semantics [35,34,15,37,12,10] and investigating the correctness of inference algorithms [16,7,22]. The latter were mostly designed in the context of "traditional" statistics and features such as conditionals, which are ubiquitous in programming, pose a major challenge for correctness.

Inference algorithms broadly fall into two categories: Markov chain Monte Carlo (MCMC), which yields a sequence of samples asymptotically approaching the true posterior, and variational inference.

Variational Inference. In the variational inference approach to Bayesian statistics [40,30,5,6], the problem of approximating difficult-to-compute posterior probability distributions is transformed to an optimisation problem. The idea is to approximate the posterior probability p(z | x) using a family of "simpler" densities qθ(z) over the latent variables z, parameterised by θ. The optimisation problem is then to find the parameter θ ∗ such that q<sup>θ</sup> <sup>∗</sup> (z) is "closest" to the true posterior p(z | x). Since the variational family may not contain the true posterior, q<sup>θ</sup> <sup>∗</sup> is an approximation in general. In practice, variational inference has proven to yield good approximations much faster than MCMC.

Formally, the idea is captured by minimising the KL-divergence [30,5] between the variational approximation and the true posterior. This is equivalent to maximising the ELBO function, which only depends on the joint distribution p(x, z) and not the posterior, which we seek to infer after all:

$$\text{ELBO}\_{\theta} := \mathbb{E}\_{\mathbf{z} \sim q\_{\theta}(\mathbf{z})} [\log p(\mathbf{x}, \mathbf{z}) - \log q\_{\theta}(\mathbf{z})] \tag{1}$$

Gradient Based Optimisation. In practice, variants of Stochastic Gradient Descent (SGD) are frequently employed to solve optimisation problems of the following form: argmin<sup>θ</sup> Es∼q(s) [f(θ, s)]. In its simplest version, SGD follows Monte Carlo estimates of the gradient in each step:

$$\boldsymbol{\theta}\_{k+1} := \boldsymbol{\theta}\_k - \gamma\_k \underbrace{\frac{1}{N} \sum\_{i=1}^N \nabla\_{\boldsymbol{\theta}} f\left(\boldsymbol{\theta}\_k, \mathbf{s}\_k^{(i)}\right)}\_{\text{gradient estimator}}$$

where s (i) <sup>k</sup> ∼ q s (i) k and γ<sup>k</sup> is the step size.

For the correctness of SGD it is crucial that the estimation of the gradient is unbiased, i.e. correct in expectation:

$$\mathbb{E}\_{\mathbf{s}^{(1)},\ldots,\mathbf{s}^{(N)}\sim q} \left[ \frac{1}{N} \sum\_{i=1}^{N} \nabla\_{\theta} f \left( \theta, \mathbf{s}^{(i)} \right) \right] = \nabla\_{\theta} \mathbb{E}\_{\mathbf{s}\sim q(\mathbf{s})} [f(\theta, \mathbf{s})],$$

This property, which is about commuting differentiation and integration, can be established by the dominated convergence theorem [21, Theorem 6.28].

Note that we cannot directly estimate the gradient of the ELBO in Eq. (1) with Monte Carlo because the distribution w.r.t. which the expectation is taken also depends on the parameters. However, the so-called log-derivative trick can be used to derive an unbiased estimate, which is known as the Score or REIN-FORCE estimator [31,38,27,28].

Reparameterisation Gradient. Whilst the score estimator has the virtue of being very widely applicable, it unfortunately suffers from high variance, which can cause SGD to yield very poor results<sup>3</sup> .

The reparameterisation gradient estimator—the dominant approach in variational inference—reparameterises the latent variable z in terms of a base random variable s (viewed as the entropy source) via a diffeomorphic transformation φ<sup>θ</sup> , such as a location-scale transformation or cumulative distribution function. For example, if the distribution of the latent variable z is a Gaussian N (z | µ, σ<sup>2</sup> ) with parameters θ = {µ, σ} then the location-scale transformation using the standard normal as the base distribution gives rise to the reparameterisation

$$z \sim \mathcal{N}(z \mid \mu, \sigma^2) \iff z = \phi\_{\mu, \sigma}(s), \quad s \sim \mathcal{N}(0, 1). \tag{2}$$

where φµ,σ(s) := s · σ + µ. The key advantage of this setup (often called "reparameterisation trick" [20,36,32]) is that we have removed the dependency on θ from the distribution w.r.t. which the expectation is taken. Therefore, we can now differentiate (by backpropagation) with respect to the parameters θ of the variational distributions using Monte Carlo simulation with draws from the base distribution s. Thus, succinctly, we have

$$\nabla\_{\boldsymbol{\theta}} \mathbb{E}\_{\mathbf{z} \sim q\_{\boldsymbol{\theta}}(\mathbf{z})} [f(\boldsymbol{\theta}, \mathbf{z})] = \nabla\_{\boldsymbol{\theta}} \mathbb{E}\_{\mathbf{s} \sim q(\mathbf{s})} [f(\boldsymbol{\theta}, \boldsymbol{\phi}\_{\boldsymbol{\theta}}(\mathbf{s}))] = \mathbb{E}\_{\mathbf{s} \sim q(\mathbf{s})} [\nabla\_{\boldsymbol{\theta}} f(\boldsymbol{\theta}, \boldsymbol{\phi}\_{\boldsymbol{\theta}}(\mathbf{s}))]$$

The main benefit of the reparameterisation gradient estimator is that it has a significantly lower variance than the score estimator, resulting in faster convergence.

Bias of the Reparameterisation Gradient. Unfortunately, the reparameterisation gradient estimator is biased for non-differentiable models [23], which are readily expressible in programming languages with conditionals:

Example 1. The counterexample in [23, Proposition 2], where the objective function is the ELBO for a non-differentiable model, can be simplified to

$$f(\theta, s) = -0.5 \cdot \theta^2 + \begin{cases} 0 & \text{if } s + \theta < 0 \\ 1 & \text{otherwise} \end{cases}$$

Observe that (see Fig. 1a):

$$\nabla\_{\theta} \mathbb{E}\_{s \sim \mathcal{N}(0,1)} \left[ f(\theta, s) \right] = -\theta + \mathcal{N}(-\theta \mid 0, 1) \neq -\theta = \mathbb{E}\_{s \sim \mathcal{N}(0,1)} \left[ \nabla\_{\theta} f(\theta, s) \right]$$

<sup>3</sup> see e.g. Fig. 5a or [28]

(a) Dashed red: biased estimator Es∼N(0,1) [∇θf(θ, s)], solid green: true gradient ∇<sup>θ</sup> Es∼N(0,1) [f(θ, s)].

(b) ELBO trajectories (higher means better) obtained with our implementation (cf. Section 7)

Fig. 1: Bias of the reparameterisation gradient estimator for Example 1.

Crucially this may compromise convergence to critical points or maximisers : even if we can find a point where the gradient estimator vanishes, it may not be a critical point (let alone optimum) of the original optimisation problem (cf. Fig. 1b)

#### Informal Approach

As our starting point we take a variant of the simply typed lambda calculus with reals, conditionals and a sampling construct. We abstract the optimisation of the ELBO to the following generic optimisation problem

$$\text{argmin}\_{\theta} \mathbb{E}\_{\mathbf{s} \sim \mathcal{D}} [\![M] \!] (\theta, \mathbf{s}) ] \tag{3}$$

where <sup>J</sup>M<sup>K</sup> is the value function [7,26] of a program <sup>M</sup> and <sup>D</sup> is independent of the parameters θ and it is determined by the distributions from which M samples. Owing to the presence of conditionals, the function <sup>J</sup>M<sup>K</sup> may not be continuous, let alone differentiable.

Example 1 can be expressed as

$$(\lambda z. -0.5 \cdot \theta^2 + (\text{if } z < 0 \text{ then } 0 \text{ else } 1)) \, (\text{sample }\_N + \theta)$$

Our approach is based on a denotational semantics <sup>J</sup>(−)K<sup>η</sup> (for accuracy coefficient η > <sup>0</sup>) of programs in the (new) cartesian closed category VectFr, which generalises smooth manifolds and extends Frölicher spaces (see e.g. [13,33]) with a vector space structure.

Intuitively, we replace the Heaviside stepfunction usually arising in the interpretation of conditionals by smooth approximations. In particular, we interpret the conditional of Example 1 as

Fig. 2: (Logistic) sigmoid function σ<sup>η</sup> (dotted: η = 1 3 , dashed: η = 1 <sup>15</sup> ) and the Heaviside step function (red, solid).

$$\|\mathbf{if}\ s + \theta < 0 \,\,\mathbf{then}\,\underline{\mathbf{0}}\,\mathrm{else}\,\underline{\mathbf{1}}\|\_{\eta}(\theta, s) := \sigma\_{\eta}(s + \theta)$$

where σ<sup>η</sup> is a smooth function. For instance we can choose ση(x) := σ( x η ) where σ(x) := 1 1+exp(−x) is the (logistic) sigmoid function (cf. Fig. 2). Thus, the program <sup>M</sup> is interpreted by a smooth function <sup>J</sup>MK<sup>η</sup>, for which the reparameterisation gradient may be estimated unbiasedly. Therefore, we apply stochastic gradient descent on the smoothed program.

#### Contributions

The high-level contribution of this paper is laying a theoretical foundation for correct yet efficient (variational) inference for probabilistic programming. We employ a smoothed interpretation of programs to obtain unbiased (reparameterisation) gradient estimators and establish technical pre-conditions by type systems. In more detail:


Outline. In the next section we introduce a simple higher-order probabilistic programming language, its denotational value semantics and operational semantics; Optimisation Problem 1 is then stated. Section 3 is devoted to a smoothed denotational value semantics, and we state the Smooth Optimisation Problem 2. In Sections 4 and 5 we develop annotation based type systems enforcing the correctness of SGD and the convergence of the smoothing, respectively. Related work is briefly discussed in Section 6 before we present the results of our empirical evaluation in Section 7. We conclude in Section 8 and discuss future directions.

Notation. We use the following conventions: bold font for vectors and lists, ++ for concatenation of lists, ∇<sup>θ</sup> for gradients (w.r.t. θ),[φ] for the Iverson bracket of a predicate φ and calligraphic font for distributions, in particular N for normal distributions. Besides, we highlight noteworthy items using red.

# 2 A Simple Programming Language

In this section, we introduce our programming language, which is the simplytyped lambda calculus with reals, augmented with conditionals and sampling from continuous distributions.

# 2.1 Syntax

The raw terms of the programming language are defined by the grammar:

$$\begin{aligned} M &::= x \mid \theta\_i \mid \underline{r} \mid \pm \mid : \mid \underline{\perp} \mid \underline{\text{\tiny\text{\tiny $}}} \mid \underline{\text{\tiny\text{\tiny$ }}} \mid \underline{\text{\tiny\text{\tiny $}}} \\ &\mid \text{if } M < 0 \,\text{then} \, M \,\text{else} \, M \mid \, \mathtt{sample} \, \underline{\text{\tiny$ }} \mid \lambda x. M \mid M \, M \end{aligned}$$

where x and θ<sup>i</sup> respectively range over (denumerable collections of) variables and parameters, r ∈ R, and D is a probability distribution over R (potentially with a support which is a strict subset of R). As is customary we use infix, postfix and prefix notation: M + N (addition), M · N (multiplication), M<sup>−</sup><sup>1</sup> (inverse), and −M (numeric negation). We frequently omit the underline to reduce clutter.

Example 2 (Encoding the ELBO for Variational Inference). We consider the example used by [23] in their Prop. 2 to prove the biasedness of the reparameterisation gradient. (In Example 1 we discussed a simplified version thereof.) The density is

$$p(z) := \mathcal{N}(z \mid 0, 1) \cdot \begin{cases} \mathcal{N}(0 \mid -2, 1) & \text{if } z < 0 \\ \mathcal{N}(0 \mid 5, 1) & \text{otherwise} \end{cases}$$

and they use a variational family with density qθ(z) := N (z | θ, 1), which is reparameterised using a standard normal noise distribution and transformation s 7→ s + θ.

First, we define an auxiliary term for the pdf of normals with mean m and standard derivation s:

$$N \equiv \lambda x, m, s. \left(\varprojlim \overline{2\pi} \cdot s\right)^{-1} \cdot \underline{\exp}\left(\underline{\left(-0.5\cdot\left(\left(x+\left(-m\right)\right)\cdot s^{-1}\right)^{2}\right)}\right)$$

Then, we can define

$$M \equiv \left( \lambda z. \underbrace{\log \left( N \, z \, \underline{\mathtt{0}} \, \underline{\mathtt{1}} \right) + \left( \mathtt{if} \, z < 0 \, \mathtt{then} \, \underline{\mathtt{log}} \left( N \, \underline{\mathtt{0}} \, \underline{\mathtt{2}} \, \underline{\mathtt{1}} \right) \, \mathtt{else} \, \underline{\mathtt{log}} \left( N \, \underline{\mathtt{9}} \, \underline{\mathtt{5}} \, \underline{\mathtt{1}} \right) \right)}\_{\log p} -$$
 
$$\underbrace{\log \left( N \, z \, \underline{\mathtt{theta}} \, \underline{\mathtt{1}} \right)}\_{\log q} \left( \mathtt{sample}\_{N} + \theta \right)$$

#### 2.2 A Basic Trace-Based Type System

Types are generated from base types (R and R>0, the reals and positive reals) and trace types (typically Σ, which is a finite list of probability distributions) as well as by a trace-based function space constructor of the form τ • Σ → τ 0 . Formally types are defined by the following grammar:


where D<sup>i</sup> are probability distributions. Intuitively a trace type is a description of the space of execution traces of a probabilistic program. Using trace types, a distinctive feature of our type system is that a program's type precisely characterises the space of its possible execution traces [24]. We use list concatenation notation ++ for trace types, and the shorthand τ<sup>1</sup> → τ<sup>2</sup> for function types of the form τ<sup>1</sup> • [] → τ2. Intuitively, a term has type τ • Σ → τ 0 if, when given a value of type τ , it reduces to a value of type τ <sup>0</sup> using all the samples in Σ.

Dual context typing judgements of the form, Γ | Σ ` M : τ , are defined in Fig. 3b, where Γ = x<sup>1</sup> : τ1, · · · , x<sup>n</sup> : τn, θ<sup>1</sup> : τ 0 1 , · · · , θ<sup>m</sup> : τ 0 <sup>m</sup> is a finite map describing a set of variable-type and parameter-type bindings; and the trace type Σ precisely captures the distributions from which samples are drawn in a (fully eager) call-by-value evaluation of the term M.

The subtyping of types, as defined in Fig. 3a, is essentially standard; for contexts, we define Γ v Γ 0 if for every x : τ in Γ there exists x : τ 0 in Γ 0 such that τ <sup>0</sup> v τ .

Trace types are unique [18]:

Lemma 1. If Γ | Σ ` M : τ and Γ | Σ<sup>0</sup> ` M : τ 0 then Σ = Σ<sup>0</sup> .

A term has safe type σ if it does not contain sample <sup>D</sup> or σ is a base type. Thus, perhaps slightly confusingly, we have | [D] ` sample <sup>D</sup> : R, and R is considered a safe type. Note that we use the metavariable σ to denote safe types.

Conditionals. The branches of conditionals must have a safe type. Otherwise it would not be clear how to type terms such as

$$\begin{aligned} M &\equiv \text{if } x < 0 \,\text{then} \,(\lambda x.\,\text{sample}\,\_N) \,\text{else} \,(\lambda x.\,\text{sample}\,\_{\mathcal{E}} + \text{sample}\,\_{\mathcal{E}})\\ N &\equiv \left(\lambda f.f.\left(f \,\text{sample}\,\_N\right)\right)M \end{aligned}$$

because the branches draw a different number of samples from different distributions, and have types R• [N ] → R and R• [E, E] → R, respectively. However, for M<sup>0</sup> ≡ if x < 0 then sample <sup>N</sup> else sample <sup>E</sup> + sample <sup>E</sup> we can (safely) type

$$\begin{aligned} x:R \mid [\mathcal{N}, \mathcal{E}, \mathcal{E}] \vdash M':R\\ \mid [\![\![\vdash \lambda x. M' : R \bullet [\mathcal{N}, \mathcal{E}, \mathcal{E}] \to R\ ]\!] \to R\\ \mid [\mathcal{N}, \mathcal{N}, \mathcal{E}, \mathcal{E}, \mathcal{N}, \mathcal{E}, \mathcal{E}] \vdash (\lambda f. f\left(f \mathtt{sample}\_{\mathcal{N}}\right))\left(\lambda x. M'\right) : R\end{aligned}$$

ι v ι R><sup>0</sup> v R τ 0 <sup>1</sup> v τ<sup>1</sup> τ<sup>2</sup> v τ 0 2 (τ<sup>1</sup> • Σ → τ2) v (τ 0 <sup>1</sup> • Σ → τ 0 2) (a) Subtyping Γ | Σ ` M : τ Γ 0 | Σ ` M : τ <sup>0</sup> Γ v Γ 0 , τ v τ 0 x : τ | [] ` x : τ | [] ` r : R r ∈ R | [] ` r : R><sup>0</sup> r ∈ R><sup>0</sup> | [] ` ◦ : R → R → R ◦ ∈ {+, ·} <sup>|</sup> [] ` ◦ : <sup>R</sup>><sup>0</sup> <sup>→</sup> <sup>R</sup>><sup>0</sup> <sup>→</sup> <sup>R</sup>><sup>0</sup> ◦ ∈ {+, ·} | [] ` − : R → R | [] ` −1 : R><sup>0</sup> → R><sup>0</sup> | [] ` exp : R → R><sup>0</sup> | [] ` log : R><sup>0</sup> → R Γ | Σ ` L : R Γ | Σ <sup>0</sup> ` M : σ Γ | Σ <sup>00</sup> ` N : σ Γ | Σ ++ Σ <sup>0</sup> ++ Σ <sup>00</sup> ` if L < 0 then M else N : σ | [D] ` sample <sup>D</sup> : R Γ, y : τ<sup>1</sup> | Σ ` M : τ<sup>2</sup> Γ | [] ` λy. M : τ<sup>1</sup> • Σ → τ<sup>2</sup> Γ | Σ<sup>1</sup> ` M : τ<sup>1</sup> • Σ<sup>3</sup> → τ<sup>2</sup> Γ | Σ<sup>2</sup> ` N : τ<sup>1</sup> Γ | Σ<sup>1</sup> ++ Σ<sup>2</sup> ++ Σ<sup>3</sup> ` M N : τ<sup>2</sup> (b) Typing judgments Fig. 3: A Basic Trace-based Type System

Example 3. Consider the following terms:

L ≡ λx. sample <sup>N</sup> + sample <sup>N</sup> M ≡ if x < 0 then (λy. y + y) sample <sup>N</sup> else (sample <sup>N</sup> + sample <sup>N</sup> )

We can derive the following typing judgements:

$$\begin{aligned} \mid [] \vdash L : R\_{>0} \bullet [\mathcal{N}, \mathcal{N}] \to R \\ \mid x : R\_{>0} \mid [\mathcal{N}, \mathcal{N}, \mathcal{N}] \vdash M : R \\ \mid [] \vdash \lambda x.M : R\_{>0} \bullet [\mathcal{N}, \mathcal{N}, \mathcal{N}] \to R \\ \mid [\mathcal{N}, \mathcal{N}, \mathcal{N}, \mathcal{N}] \vdash (\lambda x.M) \mathtt{sample}\_{\mathcal{N}} : R \\ \mid [\mathcal{N}, \mathcal{N}] \vdash (\lambda f.f \,(f \, 0)) \,(\lambda x.\mathtt{sample}\_{\mathcal{N}}) : R \end{aligned}$$

Note that if x < 0 then (λx. sample <sup>N</sup> ) else (λx. x) is not typable.

#### 2.3 Denotational Value Semantics

Next, we endow our language with a (measurable) value semantics. It is wellknown that the category of measurable spaces and measurable functions is not cartesian-closed [1], which means that there is no interpretation of the lambda calculus as measurable functions. These difficulties led [14] to develop the category QBS of quasi-Borel spaces. Notably, morphisms can be combined piecewisely, which we need for conditionals.

We interpret our programming language in the category QBS of quasi-Borel spaces. Types are interpreted as follows:

$$\begin{aligned} \left[R\right] := \left(\mathbb{R}, M\_{\mathbb{R}}\right) \qquad \left[R\_{>0}\right] := \left(\mathbb{R}\_{>0}, M\_{\mathbb{R}\_{>0}}\right) \qquad \left[\left[\mathcal{D}\_1, \dots, \mathcal{D}\_n\right]\right] := \left(\mathbb{R}, M\_{\mathbb{R}}\right)^n\\ \left[\tau\_1 \bullet \Sigma \to \tau\_2\right] := \left[\tau\_1\right] \times \left[\Sigma\right] \Rightarrow \left[\tau\_2\right] \end{aligned}$$

where M<sup>R</sup> is the set of measurable functions R → R; similarly for M<sup>R</sup>><sup>0</sup> . (As for trace types, we use list notation (and list concatenation) for traces.)

We first define a handy helper function for interpreting application. For f : <sup>J</sup>Γ<sup>K</sup> <sup>×</sup> <sup>R</sup> <sup>n</sup><sup>1</sup> <sup>⇒</sup> <sup>J</sup>τ<sup>1</sup> • <sup>Σ</sup><sup>3</sup> <sup>→</sup> <sup>τ</sup>2<sup>K</sup> and <sup>g</sup> : <sup>J</sup>Γ<sup>K</sup> <sup>×</sup> <sup>R</sup> <sup>n</sup><sup>2</sup> <sup>⇒</sup> <sup>J</sup>τ1<sup>K</sup> define

$$\begin{aligned} f \llcorner \mathbb{B} &: \left\lbrack \left\lbrack \Gamma \right\rbrack \times \mathbb{R}^{n\_1 + n\_2 + \left\lvert \Sigma\_3 \right\rbrack} \Rightarrow \left\lbrack \tau\_2 \right\rbrack \\ & (\gamma, \mathbf{s}\_1 \nvdash \mathbf{s}\_2 \nleftrightarrow \mathbf{s}\_3) \mapsto f(\gamma, \mathbf{s}\_1) (g(\gamma, \mathbf{s}\_2), \mathbf{s}\_3) \quad \mathbf{s}\_1 \in \mathbb{R}^{n\_1}, \mathbf{s}\_2 \in \mathbb{R}^{n\_2}, \mathbf{s}\_3 \in \mathbb{R}^{|\Sigma\_3|} \end{aligned}$$

We interpret terms-in-context, <sup>J</sup><sup>Γ</sup> <sup>|</sup> <sup>Σ</sup> ` <sup>M</sup> : <sup>τ</sup> <sup>K</sup> : <sup>J</sup>ΓK×JΣ<sup>K</sup> <sup>→</sup> <sup>J</sup><sup>τ</sup> <sup>K</sup>, as follows:

<sup>J</sup><sup>Γ</sup> <sup>|</sup> [D] ` sample <sup>D</sup> : <sup>R</sup>K(γ, [s]) := <sup>s</sup> <sup>J</sup><sup>Γ</sup> <sup>|</sup> [] ` λy. M : <sup>τ</sup><sup>1</sup> • <sup>Σ</sup> <sup>→</sup> <sup>τ</sup>2K(γ, []) := (v, <sup>s</sup>) <sup>∈</sup> <sup>J</sup>τ1<sup>K</sup> <sup>×</sup> <sup>J</sup>Σ<sup>K</sup> 7→ <sup>J</sup>Γ, x : <sup>τ</sup><sup>1</sup> <sup>|</sup> <sup>Σ</sup> ` <sup>M</sup> : <sup>τ</sup>2K((γ, v), <sup>s</sup>) <sup>J</sup><sup>Γ</sup> <sup>|</sup> <sup>Σ</sup><sup>1</sup> ++ <sup>Σ</sup><sup>2</sup> ++ <sup>Σ</sup><sup>3</sup> ` M N : <sup>τ</sup> <sup>K</sup> := <sup>J</sup><sup>Γ</sup> <sup>|</sup> <sup>Σ</sup><sup>1</sup> ` <sup>M</sup> : <sup>τ</sup><sup>1</sup> • <sup>Σ</sup><sup>3</sup> <sup>→</sup> <sup>τ</sup>2<sup>K</sup> @ <sup>J</sup><sup>Γ</sup> <sup>|</sup> <sup>Σ</sup><sup>2</sup> ` <sup>N</sup> : <sup>τ</sup>1<sup>K</sup> <sup>J</sup><sup>Γ</sup> <sup>|</sup> <sup>Σ</sup><sup>1</sup> ++ <sup>Σ</sup><sup>2</sup> ++ <sup>Σ</sup><sup>3</sup> ` if L < <sup>0</sup> then <sup>M</sup> else <sup>N</sup> : <sup>τ</sup> <sup>K</sup>(γ, <sup>s</sup><sup>1</sup> ++ <sup>s</sup><sup>2</sup> ++ <sup>s</sup>3)) := ( <sup>J</sup><sup>Γ</sup> <sup>|</sup> <sup>Σ</sup><sup>2</sup> ` <sup>M</sup> : <sup>τ</sup> <sup>K</sup>(γ, <sup>s</sup>2) if <sup>J</sup><sup>Γ</sup> <sup>|</sup> <sup>Σ</sup><sup>1</sup> ` <sup>L</sup> : <sup>R</sup>K(γ, <sup>s</sup>1) <sup>&</sup>lt; <sup>0</sup> <sup>J</sup><sup>Γ</sup> <sup>|</sup> <sup>Σ</sup><sup>3</sup> ` <sup>N</sup> : <sup>τ</sup> <sup>K</sup>(γ, <sup>s</sup>3) otherwise

It is not difficult to see that this interpretation of terms-in-context is welldefined and total. For the conditional clause, we may assume that the trace type and the trace are presented as partitions Σ<sup>1</sup> ++ Σ<sup>2</sup> ++ Σ<sup>3</sup> and s<sup>1</sup> ++ s<sup>2</sup> ++ s<sup>3</sup> respectively. This is justified because it follows from the judgement Γ | Σ<sup>1</sup> ++ Σ<sup>2</sup> ++ Σ<sup>3</sup> ` if L < 0 then M else N : τ that Γ | Σ<sup>1</sup> ` L : R, Γ | Σ<sup>2</sup> ` M : σ and Γ | Σ<sup>3</sup> ` N : σ are provable; and we know that each of Σ1, Σ<sup>2</sup> and Σ<sup>3</sup> is unique, thanks to Lemma 1; their respective lengths then determine the partition s<sup>1</sup> ++ s<sup>2</sup> ++ s3. Similarly for the application clause, the components Σ<sup>1</sup> and Σ<sup>2</sup> are determined by Lemma 1, and Σ<sup>3</sup> by the type of M.

#### 2.4 Relation to Operational Semantics

We can also endow our language with a big-step CBV sampling-based semantics similar to [7,26], as defined in [18, Fig. 6]. We write M ⇓ s <sup>w</sup> V to mean that M reduces to value V , which is a real constant or an abstraction, using the execution trace s and accumulating weight w.

Based on this, we can define the value- and weight-functions:

$$\text{value}\_M(\mathbf{s}) := \begin{cases} V & \text{if } M \Downarrow\_w^\mathbf{s} V \\ \text{undef} & \text{otherwise} \end{cases} \qquad \text{weight}\_M(\mathbf{s}) := \begin{cases} w & \text{if } M \Downarrow\_w^\mathbf{s} V \\ 0 & \text{otherwise} \end{cases}$$

Our semantics is a bit non-standard in that for conditionals we evaluate both branches eagerly. The technical advantage is that for every (closed) termin-context, | [D1, · · · , Dn] ` M : ι, M reduces to a (unique) value using exactly the traces of the length encoded in the typing, i.e., n.

So in this sense, the operational semantics is "total": there is no divergence. Notice that there is no partiality caused by partial primitives such as 1/x, thanks to the typing.

Moreover there is a simple connection to our denotational value semantics:

Proposition 1. Let | [D1, . . . , Dn] ` M : ι. Then


#### 2.5 Problem Statement

We are finally ready to formally state our optimisation problem:

Problem 1. Optimisation

Given: term-in-context, θ<sup>1</sup> : ι1, · · · , θ<sup>m</sup> : ι<sup>m</sup> | [D1, . . . , Dn] ` M : R

Find: argmin<sup>θ</sup> Es1∼D1,...,sn∼D<sup>n</sup> [JMK(θ, <sup>s</sup>)]

# 3 Smoothed Denotational Value Semantics

Now we turn to our smoothed denotational value semantics, which we use to avoid the bias in the reparameterisation gradient estimator. It is parameterised by a family of smooth functions σ<sup>η</sup> : R → [0, 1]. Intuitively, we replace the Heaviside step-function arising in the interpretation of conditionals by smooth approximations (cf. Fig. 2). In particular, conditionals if z < 0 then 0 else 1 are interpreted as z 7→ ση(z) rather than [z ≥ 0] (using Iverson brackets).

Our primary example is ση(x) := σ( x η ), where σ is the (logistic) sigmoid σ(x) := 1 1+exp(−x) , see Fig. 2. Whilst at this stage no further properties other than smoothness are required, we will later need to restrict σ<sup>η</sup> to have good properties, in particular to convergence to the Heaviside step function.

As a categorical model we propose vector Frölicher spaces VectFr, which (to our knowledge) is a new construction, affording a simple and direct interpretation of the smoothed conditionals.

#### 3.1 Frölicher Spaces

We recall the definition of Frölicher spaces, which generalise smooth spaces<sup>4</sup> : A Frölicher space is a triple (X, CX, FX) where X is a set, C<sup>X</sup> ⊆ Set(R, X) is a set of curves and F<sup>X</sup> ⊆ Set(X, R) is a set of functionals. satisfying

1. if c ∈ C<sup>X</sup> and f ∈ F<sup>X</sup> then f ◦ c ∈ C∞(R, R)

2. if c : R → X such that for all f ∈ FX, f ◦ c ∈ C∞(R, R) then c ∈ C<sup>X</sup>

3. if f : X → R such that for all c ∈ CX, f ◦ c ∈ C∞(R, R) then f ∈ FX.

A morphism between Frölicher spaces (X, CX, FX) and (Y, C<sup>Y</sup> , F<sup>Y</sup> ) is a map φ : X → Y satisfying f ◦ φ ◦ c ∈ C<sup>∞</sup>(R, R) for all f ∈ F<sup>Y</sup> and c ∈ CX.

Frölicher spaces and their morphisms constitute a category Fr, which is wellknown to be cartesian closed [13,33].

#### 3.2 Vector Frölicher Spaces

To interpret our programming language smoothly we would like to interpret conditionals as ση-weighted convex combinations of its branches:

$$\begin{aligned} \left[ \text{if } L < 0 \text{ then } M \text{ else } N \right]\_{\eta} (\gamma, \mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3) &:= \\ \sigma\_{\eta} (- [L]\_{\eta} (\gamma, \mathbf{s}\_1)) \cdot [M]\_{\eta} (\gamma, \mathbf{s}\_2) &+ \sigma\_{\eta} ([L]\_{\eta} (\gamma, \mathbf{s}\_1)) \cdot [N]\_{\eta} (\gamma, \mathbf{s}\_3) \end{aligned} (4)$$

By what we have discussed so far, this only makes sense if the branches have ground type because Frölicher spaces are not equipped with a vector space structure but we take weighted combinations of morphisms. In particular if φ1, φ<sup>2</sup> : X → Y and α : X → R are morphisms then α φ<sup>1</sup> + φ<sup>2</sup> ought to be a morphism too. Therefore, we enrich Frölicher spaces with an additional vector space structure:

Definition 1. An R-vector Frölicher space is a Frölicher space (X, CX, FX) such that X is an R-vector space and whenever c, c<sup>0</sup> ∈ C<sup>X</sup> and α ∈ C<sup>∞</sup>(R, R) then α c + c <sup>0</sup> ∈ C<sup>X</sup> (defined pointwise).

A morphism between R-vector Frölicher spaces is a morphism between Frölicher spaces, i.e. φ : (X, CX, FX) → (Y, C<sup>Y</sup> , F<sup>Y</sup> ) is a morphism if for all c ∈ C<sup>X</sup> and f ∈ F<sup>Y</sup> , f ◦ φ ◦ c ∈ C<sup>∞</sup>(R, R).

R-vector Frölicher space and their morphisms constitute a category VectFr. There is an evident forgetful functor fully faithfully embedding VectFr in Fr. Note that the above restriction is a bit stronger than requiring that C<sup>X</sup> is also a vector space. (α is not necessarily a constant.) The main benefit is the following, which is crucial for the interpretation of conditionals as in Eq. (4):

Lemma 2. If φ1, φ<sup>2</sup> ∈ VectFr(X, Y ) and α ∈ VectFr(X, R) then α φ<sup>1</sup> + φ<sup>2</sup> ∈ VectFr(X, Y ) (defined pointwisely).

Proof. Suppose c ∈ C<sup>X</sup> and f ∈ F<sup>Y</sup> . Then (α<sup>1</sup> φ<sup>1</sup> + φ2) ◦ c = (α ◦ c) · (φ<sup>1</sup> ◦ c) + (φ<sup>2</sup> ◦ c) ∈ C<sup>Y</sup> (defined pointwisely) and the claim follows.

<sup>4</sup> C <sup>∞</sup>(R, R) is the set of smooth functions R → R

Similarly as for Frölicher spaces, if X is an R-vector space then any C ⊆ Set(X, R) generates a R-vector Frölicher space (X, CX, FX), where

$$\begin{aligned} \mathcal{F}\_X &:= \{ f : X \to \mathbb{R} \mid \forall c \in \mathcal{C}. f \circ c \in C^{\infty}(\mathbb{R}, \mathbb{R}) \} \\ \tilde{\mathcal{C}}\_X &:= \{ c : \mathbb{R} \to X \mid \forall f \in \mathcal{F}\_X. f \circ c \in C^{\infty}(\mathbb{R}, \mathbb{R}) \} \\ \mathcal{C}\_X &:= \left\{ \sum\_{i=1}^n \alpha\_i \, c\_i \mid n \in \mathbb{N}, \forall i \le n. \alpha\_i \in C^{\infty}(\mathbb{R}, \mathbb{R}), c\_i \in \tilde{\mathcal{C}}\_X \right\} \end{aligned}$$

Having modified the notion of Frölicher spaces generated by a set of curves, the proof for cartesian closure carries over [18] and we conclude:

Proposition 2. VectFr is cartesian closed.

#### 3.3 Smoothed Interpretation

We have now discussed all ingredients to interpret our language (smoothly) in the cartesian closed category VectFr. We call <sup>J</sup>MK<sup>η</sup> the <sup>η</sup>-smoothing of <sup>J</sup>M<sup>K</sup> (or of M, by abuse of language). The interpretation is mostly standard and follows Section 2.3, except for the case for conditionals. The latter is given by Eq. (4), for which the additional vector space structure is required.

Finally, we can phrase a smoothed version of our Optimisation Problem 1:

Problem 2. η-Smoothed Optimisation

Given: term-in-context, θ<sup>1</sup> : ι1, · · · , θ<sup>m</sup> : ι<sup>m</sup> | [D1, . . . , Dn] ` M : R, and accuracy coefficient η > 0

Find: argmin<sup>θ</sup> Es1∼D1,...,sn∼D<sup>n</sup> [JMKη(θ, <sup>s</sup>)]

# 4 Correctness of SGD for Smoothed Problem and Unbiasedness of the Reparameterisation Gradient

Next, we apply stochastic gradient descent (SGD) with the reparameterisation gradient estimator to the smoothed problem (for the batch size N = 1):

$$\boldsymbol{\theta}\_{k+1} := \boldsymbol{\theta}\_k - \gamma\_k \cdot \nabla\_{\boldsymbol{\theta}} \|\boldsymbol{M}\|\_{\boldsymbol{\eta}} \left(\boldsymbol{\theta}\_k, \mathbf{s}\_k\right) \qquad \qquad \mathbf{s}\_k \sim \mathcal{D} \tag{5}$$

where θ | [s ∼ D] ` M : R (slightly abusing notation in the trace type).

A classical choice for the step-size sequence is γ<sup>k</sup> ∈ Θ(1/k), which satisfies the so-called Robbins-Monro criterion:

$$\sum\_{k \in \mathbb{N}} \gamma\_k = \infty \tag{6}$$

In this section we wish to establish the correctness of the SGD procedure applied to the smoothing Eq. (5).

#### 4.1 Desiderata

First, we ought to take a step back and observe that the optimisation problems we are trying to solve can be ill-defined due to a failure of integrability: take <sup>M</sup> <sup>≡</sup> (λx. exp (<sup>x</sup> · <sup>x</sup>)) sample <sup>N</sup> : we have <sup>E</sup>z∼N [JMK(z)] = <sup>∞</sup>, independently of parameters. Therefore, we aim to guarantee:

(SGD0) The optimisation problems (both smoothed and unsmoothed) are well-defined.

Since <sup>E</sup>[JMKη(θ, <sup>s</sup>)] (and <sup>E</sup>[JMK(θ, <sup>s</sup>)]) may not be a convex function in the parameters θ, we cannot hope to always find global optima. We seek instead stationary points, where the gradient w.r.t. the parameters θ vanishes. The following results (whose proof is standard) provide sufficient conditions for the convergence of SGD to stationary points (see e.g. [3] or [2, Chapter 2]):

Proposition 3 (Convergence). Suppose (γk)k∈<sup>N</sup> satisfies the Robbins-Monro criterion Eq. (6) and g(θ) := Es[f(θ, s)] is well-defined. If Θ ⊆ R <sup>m</sup> satisfies

(SGD1) Unbiasedness: ∇θg(θ) = Es[∇θf(θ, s)] for all θ ∈ Θ (SGD2) g is L-Lipschitz smooth on Θ for some L > 0:

$$\|\nabla\_{\theta}g(\theta) - \nabla\_{\theta}g(\theta')\| \le L \cdot \|\theta - \theta'\| \qquad \text{for all } \theta, \theta' \in \Theta'$$

(SGD3) Bounded Variance: supθ∈<sup>Θ</sup> Es[k∇θf(θ, s)k 2 ] < ∞

then inf <sup>i</sup>∈<sup>N</sup> E[k∇g(θi)k 2 ] = 0 or θ<sup>i</sup> 6∈ Θ for some i ∈ N.

Unbiasedness (SGD1) requires commuting differentiation and integration. The validity of this operation can be established by the dominated convergence theorem [21, Theorem 6.28], see [18]. To be applicable the partial derivatives of f w.r.t. the parameters need to be dominated uniformly by an integrable function. Formally:

Definition 2. Let f : Θ × R <sup>n</sup> → R and g : R <sup>n</sup> → R. We say that g uniformly dominates f if for all (θ, s) ∈ Θ × R <sup>n</sup>, |f(θ, s)| ≤ g(s).

Also note that for Lipschitz smoothness (SGD2) it suffices to uniformly bound the second-order partial derivatives.

In the remainder of this section we present two type systems which restrict the language to guarantee properties (SGD0) to (SGD3).

#### 4.2 Piecewise Polynomials and Distributions with Finite Moments

As a first illustrative step we consider a type system `poly, which restricts terms to (piecewise) polynomials, and distributions with finite moments. Recall that a distribution D has (all) finite moments if for all p ∈ N, Es∼D[|s| p ] < ∞. Distributions with finite moments include the following commonly used distributions: normal, exponential, logistic and gamma distributions. A non-example is the Cauchy distribution, which famously does not even have an expectation.

Definition 3. For a distribution D with finite moments, f : R <sup>n</sup> → R has (all) finite moments if for all p ∈ N, Es∼D[|f(s)| p ] < ∞.

Functions with finite moments have good closure properties:

Lemma 3. If f, g : R <sup>n</sup> → R have (all) finite moments so do −f, f + g, f · g.

In particular, if a distribution has finite moments then polynomials do, too. Consequently, intuitively, it is sufficient to simply (the details are explicitly spelled out in [18]):

1. require that the distributions D in the sample rule have finite moments:

$$\begin{array}{c} \hline \multicolumn{3}{|}{|\mathcal{D}| \vdash\_{\text{poly}} \mathbf{sample} \; \_{\mathcal{D}} : R} & \mathcal{D} \text{ has finite moments} \\ \hline \end{array}$$

2. remove the rules for <sup>−</sup><sup>1</sup> , exp and log from the type system `poly.

Type Soundness I: Well-Definedness. Henceforth, we fix parameters θ<sup>1</sup> : <sup>ι</sup>1, . . . , θ<sup>m</sup> : <sup>ι</sup>m. Intuitively, it is pretty obvious that <sup>J</sup>M<sup>K</sup> is a piecewise polynomial whenever θ | Σ `poly M : ι. Nonetheless, we prove the property formally to illustrate our proof technique, a variant of logical relations, employed throughout the rest of the paper.

We define a slightly stronger logical predicate P (n) <sup>τ</sup> on Θ × R <sup>n</sup> <sup>→</sup> <sup>J</sup><sup>τ</sup> <sup>K</sup>, which allows us to obtain a uniform upper bound:

1. f ∈ P(n) <sup>ι</sup> if f is uniformly dominated by a function with finite moments 2. f ∈ P(n) τ1•Σ3→τ<sup>2</sup> if for all n<sup>2</sup> ∈ N and g ∈ P(n+n2) τ1 , f  g ∈ P(n+n2+|Σ3|) τ2

where for f : Θ × R <sup>n</sup><sup>1</sup> <sup>→</sup> <sup>J</sup>τ<sup>1</sup> • <sup>Σ</sup><sup>3</sup> <sup>→</sup> <sup>τ</sup>2<sup>K</sup> and <sup>g</sup> : <sup>Θ</sup> <sup>×</sup> <sup>R</sup> <sup>n</sup>1+n<sup>2</sup> <sup>→</sup> <sup>J</sup>τ1<sup>K</sup> we define

$$\begin{aligned} f \odot g &: \Theta \times \mathbb{R}^{n\_1 + n\_2 + |\Sigma\_3|} \to \tau\_2 \\ (\theta, \mathbf{s}\_1 + \mathbf{s}\_2 + \mathbf{s}\_3) &\mapsto f(\theta, \mathbf{s}\_1)(g(\theta, \mathbf{s}\_1 + \mathbf{s}\_2), \mathbf{s}\_3) \end{aligned}$$

Intuitively, g may depend on the samples in s<sup>2</sup> (in addition to s1) and the function application may consume further samples s<sup>3</sup> (as determined by the trace type Σ3). By induction on safe types we prove the following result, which is important for conditionals:

#### Lemma 4. If f ∈ P(n) <sup>ι</sup> and g, h ∈ P(n) <sup>σ</sup> then [f(−) < 0]·g+[f(−) ≥ 0]·h ∈ P(n) <sup>σ</sup> .

Proof. For base types it follows from Lemma 3. Hence, suppose σ has the form σ<sup>1</sup> •[] → σ2. Let n<sup>2</sup> ∈ N and x ∈ P<sup>n</sup>+n<sup>2</sup> σ<sup>1</sup> . By definition, (gx),(hx) ∈ P(n+n2) σ<sup>2</sup> . Let <sup>f</sup>bbe the extension (ignoring the additional samples) of <sup>f</sup> to <sup>Θ</sup>×<sup>R</sup> <sup>n</sup>+n<sup>2</sup> → R. It is easy to see that also <sup>f</sup>b∈ P(n+n2) <sup>ι</sup> By the inductive hypothesis,

$$[\hat{f}(-) < 0] \cdot (g \odot x) + [\hat{f}(-) \ge 0] \cdot (h \odot x) \in \mathcal{P}^{(n+n\_2)}\_{\sigma\_2}$$

Finally, by definition,

$$(([f(-) < 0] \cdot g + [f(-) \ge 0] \cdot h) \ominus x = [\widehat{f}(-) < 0] \cdot (g \ominus x) + [\widehat{f}(-) \ge 0] \cdot (h \ominus x)$$

Assumption 1 We assume that <sup>Θ</sup> <sup>⊆</sup> <sup>J</sup>ι<sup>1</sup><sup>K</sup> × · · · × <sup>J</sup>ι<sup>m</sup><sup>K</sup> is compact.

$$\begin{array}{ll}\textbf{Lemma 5 (Fundamental).} & \mathcal{If } \theta, x\_1: \tau\_1, \dots, x\_\ell: \tau\_\ell \mid \Sigma \vdash\_{\text{poly } M} M: \tau, \text{ } n \in \mathbb{N}, \\\xi\_1 \in \mathcal{P}\_{\tau\_1}^{(n)}, \dots, \xi\_\ell \in \mathcal{P}\_{\tau\_\ell}^{(n)} \text{ then } \lbrack M \rVert \* \langle \xi\_1, \dots, \xi\_\ell \rangle \in \mathcal{P}\_{\tau}^{(n + |\Sigma|)}, \text{ } where \\ & \lbrack M \rVert \* \langle \xi\_1, \dots, \xi\_\ell \rangle : \Theta \times \mathbb{R}^{n + |\Sigma|} \to \lbrack \tau \rVert \\ & & (\theta, \mathbf{s} \mapsto \mathbf{s'}) \mapsto \lbrack M \rVert ((\theta, \xi\_1(\theta, \mathbf{s}), \dots, \xi\_\ell(\theta, \mathbf{s})), \mathbf{s'}) \end{array}$$

It is worth noting that, in contrast to more standard fundamental lemmas, here we need to capture the dependency of the free variables on some number n of further samples. E.g. in the context of (λx. x) sample <sup>N</sup> the subterm x depends on a sample although this is not apparent if we consider x in isolation.

Lemma 5 is proven by structural induction [18]. The most interesting cases include: parameters, primitive operations and conditionals. In the case for parameters we exploit the compactness of Θ (Assumption 1). For primitive operations we note that as a consequence of Lemma 3 each P (n) <sup>ι</sup> is closed under negation<sup>5</sup> , addition and multiplication. Finally, for conditionals we exploit Lemma 3.

Type Soundness II: Correctness of SGD. Next, we address the integrability for the smoothed problem as well as (SGD1) to (SGD3). We establish that not only <sup>J</sup>MK<sup>η</sup> but also its partial derivatives up to order 2 are uniformly dominated by functions with finite moments. For this to possibly hold we require:

Assumption 2 For every η > 0,

sup x∈R |ση(x)| < ∞ sup x∈R |σ 0 η (x)| < ∞ sup x∈R |σ 00 η (x)| < ∞

Note that, for example, the logistic sigmoid satisfies Assumption 2.

We can then prove a fundamental lemma similar to Lemma 5, mutatis mutandis, using a logical predicate in VectFr. We stipulate f ∈ Q(n) <sup>ι</sup> if its partial derivatives up to order 2 are uniformly dominated by a function with finite moments. In addition to Lemma 3 we exploit standard rules for differentiation (such as the sum, product and chain rule) as well as Assumption 2. We conclude:

Proposition 4. If θ | Σ `poly M : R then the partial derivatives up to order 2 of <sup>J</sup>MK<sup>η</sup> are uniformly dominated by a function with all finite moments.

Consequently, the Smoothed Optimisation Problem 2 is not only well-defined but, by the dominated convergence theorem [21, Theorem 6.28], the reparameterisation gradient estimator is unbiased. Furthermore, (SGD1) to (SGD3) are satisfied and SGD is correct.

Discussion. The type system `poly is simple yet guarantees correctness of SGD. However, it is somewhat restrictive; in particular, it does not allow the expression of many ELBOs arising in variational inference directly as they often have the form of logarithms of exponential terms (cf. Example 2).

5 for ι = R

#### 4.3 A Generic Type System with Annotations

Next, we present a generic type system with annotations. In Section 4.4 we give an instantiation to make `poly more permissible and in Section 5 we turn towards a different property: the uniform convergence of the smoothings.

Typing judgements have the form Γ | Σ `? M : τ , where "?" indicates the property we aim to establish, and we annotate base types. Thus, types are generated from


Annotations are drawn from a set and may possibly restricted for safe types. Secondly, the trace types are now annotated with variables, typically Σ = [s<sup>1</sup> ∼ D1, . . . , s<sup>n</sup> ∼ Dn] where the variables s<sup>j</sup> are pairwise distinct.

For the subtyping relation we can constrain the annotations at the base type level [18]; the extension to higher types is accomplished as before.

The typing rules have the same form but they are extended with the annotations on base types and side conditions possibly constraining them. For example, the rules for addition, exponentiation and sampling are modified as follows:

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \end{array} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{(cond. Add)} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{(cond. Add)} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \begin{array}{c} \text{(cond. Exp.)} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{(cond. Exp.)} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{(cond. Exp.)} \end{array} \end{array} \end{array}$$
 
$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \text{(s)} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{(cond. Sample)} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{(cond.Exp.)} \end{array} \end{array} \end{array} \begin{array}{c} \begin{array}{c} \text{(cond.Exp.)} \end{array} \end{array} \end{$$

The rules for subtyping, variables, abstractions and applications do not need to be changed at all but they use annotated types instead of the types of Section 2.2.

$$\begin{array}{c} \begin{array}{c} \Gamma \mid \Sigma \vdash\_{?} M : \tau\\ \Gamma' \mid \Sigma \vdash\_{?} M : \tau' \end{array} \Gamma \sqsubseteq\_{?} \Gamma', \tau \sqsubseteq\_{?} \tau'\\ \begin{array}{c} \Gamma, y : \tau\_{1} \mid \Sigma \vdash\_{?} M : \tau\_{2}\\ \Gamma \mid \mid \vdash\_{?} \lambda y.M : \tau\_{1} \bullet \Sigma \rightarrow \tau\_{2} \end{array} \quad \begin{array}{c} \Gamma \mid \Sigma\_{2} \vdash\_{?} M : \tau\_{1} \bullet \Sigma\_{3} \rightarrow \tau\_{2} \quad \Gamma \mid \Sigma\_{1} \vdash\_{?} N : \tau\_{1}\\ \Gamma \mid \Sigma\_{1} \leftrightarrow \Sigma\_{2} \leftrightarrow \Sigma\_{3} \vdash\_{?} M \, N : \tau\_{2} \end{array} \end{array}$$

The full type system is presented in [18].

`poly can be considered a special case of `? whereby we use the singleton ∗ as annotations, a contradictory side condition (such as false) for the undesired primitives <sup>−</sup><sup>1</sup> , exp and log, and use the side condition "D has finite moments" for sample as above.

Table 1 provides an overview of the type systems of this paper and their purpose. `? and its instantiations refine the basic type system of Section 2.2 in the sense that if a term-in-context is provable in the annotated type system, then its erasure (i.e. erasure of the annotations of base types and distributions) is provable in the basic type system. This is straightforward to check.


Table 1: Overview of type systems in this paper.


Fig. 4: Excerpt of the typing rules (cf. [18]) for the correctness of SGD.

#### 4.4 A More Permissible Type System

In this section we discuss another instantiation, `SGD, of the generic type system system to guarantee (SGD0) to (SGD3), which is more permissible than `poly. In particular, we would like to support Example 2, which uses logarithms and densities involving exponentials. Intuitively, we need to ensure that subterms involving exp are "neutralised" by a corresponding log. To achieve this we annotate base types with 0 or 1, ordered discretely. 0 is the only annotation for safe base types and can be thought of as "integrable"; 1 denotes "needs to be passed through log". More precisely, we constrain the typing rules such that if θ | Σ `SGD M : ι (e) then<sup>6</sup> log<sup>e</sup> ◦JM<sup>K</sup> and the partial derivatives of log<sup>e</sup> ◦JMK<sup>η</sup> up to order 2 are uniformly dominated by a function with finite moments.

We subtype base types as follows: ι (e1) <sup>1</sup> vSGD ι (e2) 2 if ι<sup>1</sup> v ι<sup>2</sup> (as defined in Fig. 3a) and e<sup>1</sup> = e2, or ι<sup>1</sup> = R><sup>0</sup> = ι<sup>2</sup> and e<sup>1</sup> ≤ e2. The second disjunct may come as a surprise but we ensure that terms of type R (0) >0 cannot depend on samples at all.

In Fig. 4 we list the most important rules; we relegate the full type system to [18]. exp and log increase and decrease the annotation respectively. The rules for the primitive operations and conditionals are motivated by the closure properties

<sup>6</sup> using the convention log<sup>0</sup> is the identity

of Lemma 3 and the elementary fact that log ◦(f · g) = (log ◦f) + (log ◦g) and log ◦(f −1 ) = − log ◦f for f, g : Θ × R <sup>n</sup> → R.

Example 4. θ : R (0) >0 | [N , N ] `SGD log (θ −1 · exp (sample <sup>N</sup> )) + sample <sup>N</sup> : R(0)

Note that the branches of conditionals need to have safe type, which rules out branches with type R(1). This is because logarithms do not behave nicely when composed with addition as used in the smoothed interpretation of conditionals.

Besides, observe that in the rules for logarithm and inverses e = 0 is allowed, which may come as a surprise<sup>7</sup> . This is e.g. necessary for the typability of the variational inference Example 2:

Example 5 (Typing for Variational Inference). It holds | [] ` N : R(0) → R(0) → R (0) <sup>&</sup>gt;<sup>0</sup> → R (1) >0 and θ : R(0) | [s<sup>1</sup> ∼ N ] ` M : R(0) .

Type Soundness. To formally establish type soundness, we can use a logical predicate, which is very similar to the one in Section 4.2 (N.B. the additional Item 2): in particular f ∈ Q(n) ι (e) if


Using this and a similar logical predicate for <sup>J</sup>(−)<sup>K</sup> we can show:

Proposition 5. If θ<sup>1</sup> : ι (0), . . . , θ<sup>m</sup> : ι (0) <sup>m</sup> | Σ `SGD M : ι (0) then


Consequently, again the Smoothed Optimisation Problem 2 is not only welldefined but by the dominated convergence theorem, the reparameterisation gradient estimator is unbiased. Furthermore, (SGD1) to (SGD3) are satisfied and SGD is correct.

# 5 Uniform Convergence

In the preceding section we have shown that SGD with the reparameterisation gradient can be employed to correctly (in the sense of Proposition 3) solve the Smoothed Optimisation Problem 2 for any fixed accuracy coefficient. However, a priori, it is not clear how a solution of the Smoothed Problem 2 can help to solve the original Problem 1.

The following illustrates the potential for significant discrepancies:

<sup>7</sup> Recall that terms of type R (0) >0 cannot depend on samples.

Example 6. Consider M ≡ if 0 < 0 then θ · θ + 1 else (θ −1)·(θ −1). Notice that the global minimum and the only stationary point of <sup>J</sup>MK<sup>η</sup> is at <sup>θ</sup> <sup>=</sup> 1 2 regardless of η > <sup>0</sup>, where <sup>J</sup>MK<sup>η</sup>( 1 2 ) = <sup>3</sup> 4 . On the other hand <sup>J</sup>MK( 1 2 ) = <sup>1</sup> 4 and the global minimum of <sup>J</sup>M<sup>K</sup> is at <sup>θ</sup> = 1.

In this section we investigate under which conditions the smoothed objective function converges to the original objective function uniformly in θ ∈ Θ:

(Unif) <sup>E</sup>s∼D [JMK<sup>η</sup>(θ, <sup>s</sup>)] unif. −−−→ <sup>E</sup>s∼D [JMK(θ, <sup>s</sup>)] as <sup>η</sup> & <sup>0</sup> for <sup>θ</sup> <sup>∈</sup> <sup>Θ</sup>

We design a type system guaranteeing this.

The practical significance of uniform convergence is that before running SGD, for every error tolerance > 0 we can find an accuracy coefficient η > 0 such that the difference between the smoothed and original objective function does not exceed , in particular for θ ∗ delivered by the SGD run for the η-smoothed problem.

Discussion of Restrictions. To rule out the pathology of Example 6 we require that guards are non-0 almost everywhere.

Furthermore, as a consequence of the uniform limit theorem [29], (Unif) can only possibly hold if the expectation <sup>E</sup>s∼D [JMK(θ, <sup>s</sup>)] is continuous (as a function of the parameters θ). For a straightforward counterexample take <sup>M</sup> <sup>≡</sup> if θ < <sup>0</sup> then <sup>0</sup> else <sup>1</sup>, we have <sup>E</sup>s[JMK(θ)] = [<sup>θ</sup> <sup>≥</sup> 0] which is discontinuous, let alone differentiable, at θ = 0. Our approach is to require that guards do not depend directly on parameters but they may do so, indirectly, via a diffeomorphic<sup>8</sup> reparameterisation transform; see Example 8. We call such guards safe.

In summary, our aim, intuitively, is to ensure that guards are the composition of a diffeomorphic transformation of the random samples (potentially depending on parameters) and a function which does not vanish almost everywhere.

#### 5.1 Type System for Guard Safety

In order to enforce this requirement and to make the transformation more explicit, we introduce syntactic sugar, transform sample <sup>D</sup> by T, for applications of the form T sample <sup>D</sup>.

Example 7. As expressed in Eq. (2), we can obtain samples from N (µ, σ<sup>2</sup> ) via transform sample <sup>N</sup> by (λs. s · σ + µ), which is syntactic sugar for the term (λs. s · σ + µ) sample <sup>N</sup> .

We propose another instance of the generic type system of Section 4.3, `unif, where we annotate base types by α = (g, ∆), where g ∈ {f, t} denotes whether we seek to establish guard safety and ∆ is a finite set of s<sup>j</sup> capturing possible dependencies on samples. We subtype base types as follows: ι (g1,∆1) <sup>1</sup> vunif ι (g2,∆2) 2

<sup>8</sup> [18, Example 12] illustrates why it is not sufficient to restrict the reparameterisation transform to bijections (rather, we require it to be a diffeomorphism).

if ι<sup>1</sup> v ι<sup>2</sup> (as defined in Fig. 3a), ∆<sup>1</sup> ⊆ ∆<sup>2</sup> and g<sup>1</sup> g2, where t f. This is motivated by the intuition that we can always drop<sup>9</sup> guard safety and add more dependencies.

The rule for conditionals ensures that only safe guards are used. The unary operations preserve variable dependencies and guard safety. Parameters and constants are not guard safe and depend on no samples (see [18] for the full type system):

Γ | Σ `unif L : ι (t,∆) Γ | Σ<sup>0</sup> `unif M : σ Γ | Σ<sup>00</sup> `unif N : σ Γ | Σ ++ Σ<sup>0</sup> ++ Σ<sup>00</sup> `unif if L < 0 then M else N : σ | [] `unif − : R(g,∆) → R(g,∆) θi : ι (f,∅) | [] `unif θ<sup>i</sup> : ι (f,∅) | [] `unif r : ι (f,∅) <sup>r</sup> <sup>∈</sup> <sup>J</sup>ι<sup>K</sup> θ | [] `unif T : R<sup>α</sup> → R<sup>α</sup> θ | [s<sup>j</sup> ∼ D] `unif transform sample <sup>D</sup> by T : R(t,{s<sup>j</sup> }) T diffeomorphic

A term <sup>θ</sup> <sup>|</sup> [] `unif <sup>T</sup> : <sup>R</sup><sup>α</sup> <sup>→</sup> <sup>R</sup><sup>α</sup> is diffeomorphic if <sup>J</sup>TK(θ, []) = <sup>J</sup>TKη(θ, []) : R → R is a diffeomorphism for each θ ∈ Θ, i.e. differentiable and bijective with differentiable inverse.

First, we can express affine transformations, in particular, the location-scale transformations as in Example 7:

Example 8 (Location-Scale Transformation). The term-in-context

$$\sigma: R\_{>0}^{(\mathbf{f},\emptyset)}, \mu: R^{(\mathbf{f},\emptyset)} \mid [] \vdash \lambda s. \sigma \cdot s + \mu: R^{(\mathbf{f},\{s\_1\})} \to R^{(\mathbf{f},\{s\_1\})}$$

is diffeomorphic. (However for σ : R(f,∅) it is not because it admits σ = 0.) Hence, the reparameterisation transform

$$G \equiv \sigma : R\_{\geq 0}^{(\mathbf{f}, \emptyset)}, \mu : R^{(\mathbf{f}, \emptyset)} \mid [s\_1 : \mathcal{D}] \vdash \texttt{transform sample } \mathcal{D} \text{ by } (\lambda s. s \cdot \sigma + \mu) : R^{(\mathbf{t}, \{s\_1\})}$$

which has g-flag t, is admissible as a guard term. Notice that G depends on the parameters, σ and µ, indirectly through a diffeomorphism, which is permitted by the type system.

If guard safety is sought to be established for the binary operations, we require that operands do not share dependencies on samples:

$$\begin{array}{c} \begin{array}{l} \begin{array}{l} \begin{array}{l} \left[\begin{array}{c} \left[\begin{array}{c} \left(\mathbf{t},\boldsymbol{\Delta}\right)\end{array}\right]\end{array}\mapsto\iota^{\left(\mathbf{f},\boldsymbol{\Delta}\right)}\rightarrow\iota^{\left(\mathbf{f},\boldsymbol{\Delta}\right)}\rightarrow\iota^{\left(\mathbf{f},\boldsymbol{\Delta}\right)}\end{array}\end{array}\circ\in\left\{\mathrel{+},\cdot\right\} \\\hline \begin{array}{l} \begin{array}{l} \left[\begin{array}{c} \left[\begin{array}{c} \left(\mathbf{t},\boldsymbol{\Delta}\_{1}\right)\end{array}\right]\rightarrow\iota^{\left(\mathbf{t},\boldsymbol{\Delta}\_{2}\right)}\rightarrow\iota^{\left(\mathbf{t},\boldsymbol{\Delta}\_{1}\cup\Delta\_{2}\right)}\end{array}\circ\in\left\{\mathrel{+},\cdot\right\},\Delta\_{1}\cap\Delta\_{2}=\emptyset \end{array}\end{array}\end{array}$$

This is designed to address:

<sup>9</sup> as long as it is not used in guards

Example 9 (Non-Constant Guards). We have | [] ` (λx.x + (−x)) : R(f,{s1}) → R(f,{s1}) , noting that we must use g = f for the + rule; and because R(t,{s<sup>j</sup> }) vunif R(f,{s<sup>j</sup> }) , we have

$$\|\left[\right]\!\vdash \left(\lambda x.x + \left(\underline{-x}\right)\right) : R^{(\mathbf{t},\{s\_1\})} \to R^{(\mathbf{f},\{s\_1\})} .$$

Now transform sample <sup>D</sup> by (λy.y) has type R(t,{s1}) with the g-flag necessarily set to t; and so the term

$$M \equiv \left(\lambda x. x + (-x)\right) \mathbf{t}\text{ transform sample }\_{\mathcal{D}} \mathbf{b} \mathbf{y} \left(\lambda y. y\right).$$

which denotes 0, has type R(f,{s1}) , but not R(t,{s1}) . It follows that M cannot be used in guards (notice the side condition of the rule for conditional), which is as desired: recall Example 6. Similarly consider the term

$$N \equiv \left(\lambda x.(\lambda y \, z.\text{if } y + (-z) < 0 \, \text{then} \, M\_1 \, \text{else} \, M\_2) \, x \, x\right)$$

$$\left(\text{transform} \, \text{sample} \, \mathcal{D} \, \text{by} \, (\lambda y. y)\right) \tag{7}$$

When evaluated, the term y + (−z) in the guard has denotation 0. For the same reason as above, the term N is not refinement typable.

The type system is however incomplete, in the sense that there are terms-incontext that satisfy the property (Unif) but which are not typable.

Example 10 (Incompleteness). The following term-in-context denotes the "identity":

$$[\ ] \vdash (\lambda x.(\underline{2} \cdot x) + (-x)) : R^{(\mathbf{t}, \{s\_1\})} \to R^{(\mathbf{f}, \{s\_1\})}$$

but it does not have type R(t,{s1}) → R(t,{s1}) . Then, using the same reasoning as Example 9, the term

$$G \equiv \left(\lambda x.(\underline{2} \cdot x) + (-x)\right) \left(\mathsf{transform} \, \mathsf{sample}\_{\mathcal{D}} \, \mathsf{by}\left(\lambda y. y\right)\right)$$

has type R(f,{s1}) , but not R(t,{s1}) , and so if G < 0 then 0 else 1 is not typable, even though G can safely be used in guards.

#### 5.2 Type Soundness

Henceforth, we fix parameters θ<sup>1</sup> : ι (f,∅) 1 , . . . , θ<sup>m</sup> : ι (f,∅) <sup>m</sup> .

Now, we address how to show property (Unif), i.e. that for θ | Σ `unif M : ι (g,∆) , the <sup>η</sup>-smoothed <sup>E</sup>[JMK<sup>η</sup>(θ, <sup>s</sup>)] converges uniformly for <sup>θ</sup> <sup>∈</sup> <sup>Θ</sup> as <sup>η</sup> & <sup>0</sup>. For this to hold we clearly need to require that σ<sup>η</sup> has good (uniform) convergence properties (as far as the unavoidable discontinuity at 0 allows for):

Assumption 3 For every δ > 0, σ<sup>η</sup> unif. −−−→ [(−) > 0] on (−∞, −δ) ∪ (δ, ∞).

Observe that in general even if <sup>M</sup> is typable <sup>J</sup>MK<sup>η</sup> does not converge uniformly in both <sup>θ</sup> and <sup>s</sup> because <sup>J</sup>M<sup>K</sup> may still be discontinuous in <sup>s</sup>:

Example 11. For M ≡ if (transform sample <sup>N</sup> by (λs. s+θ)) < 0 then 0 else 1, <sup>J</sup>MK(θ, s) = [<sup>s</sup> <sup>+</sup> <sup>θ</sup> <sup>≥</sup> 0], which is discontinuous, and <sup>J</sup>MK<sup>η</sup>(θ, s) = <sup>σ</sup>η(<sup>s</sup> <sup>+</sup> <sup>θ</sup>).

However, if θ | Σ ` M : ι (g,∆) then <sup>J</sup>MK<sup>η</sup> does converge to <sup>J</sup>M<sup>K</sup> uniformly almost uniformly, i.e., uniformly in θ ∈ Θ and almost uniformly in s ∈ R n. Formally, we define:

Definition 4. Let f, f<sup>η</sup> : Θ × R <sup>n</sup> → R, µ be a measure on R <sup>n</sup>. We say that f<sup>η</sup> converges uniformly almost uniformly to f (notation: f<sup>η</sup> u.a.u. −−−−→ f) if there exist sequences (δk)k∈N, (k)k∈<sup>N</sup> and (ηk)k∈<sup>N</sup> such that limk→∞ δ<sup>k</sup> = 0 = limk→∞ k; and for every k ∈ N and θ ∈ Θ there exists U ⊆ R <sup>n</sup> such that

1. µ(U) < δ<sup>k</sup> and

2. for every 0 < η < η<sup>k</sup> and s ∈ R <sup>n</sup> \ U, |fη(θ, s) − f(θ, s)| < k.

If f, f<sup>η</sup> are independent of θ this notion coincides with standard almost uniform convergence. For <sup>M</sup> from Example <sup>11</sup> <sup>J</sup>MK<sup>η</sup> u.a.u. −−−→ <sup>J</sup>M<sup>K</sup> holds although uniform convergence fails.

However, uniform almost uniform convergence entails uniform convergence of expectations:

Lemma 6. Let f, f<sup>η</sup> : Θ × R <sup>n</sup> → R have finite moments. If f<sup>η</sup> u.a.u. −−−−→ <sup>f</sup> then <sup>E</sup>s∼D[fη(θ, <sup>s</sup>)] unif. −−−→ Es∼D[f(θ, s)].

As a consequence, it suffices to establish <sup>J</sup>MK<sup>η</sup> u.a.u. −−−→ <sup>J</sup>MK. We achieve this by positing an infinitary logical relation between sequences of morphisms in VectFr (corresponding to the smoothings) and morphisms in QBS (corresponding to the measurable standard semantics). We then prove a fundamental lemma (details are in [18]). Not surprisingly the case for conditionals is most interesting. This makes use of Assumption 3 and exploits that guards, for which the typing rules assert the guard safety flag to be t, can only be 0 at sets of measure 0. We conclude:

Theorem 1. If θ<sup>1</sup> : ι (f,∅) 1 , . . . , θ<sup>m</sup> : ι (f,∅) <sup>m</sup> | Σ `unif M : R(g,∆) then <sup>J</sup>MK<sup>η</sup> u.a.u. −−−−→ <sup>J</sup>MK. In particular, if <sup>J</sup>MK<sup>η</sup> and <sup>J</sup>M<sup>K</sup> also have finite moments then

$$\mathbb{E}\_{\mathbf{s}\sim\mathcal{D}}[\llbracketM\rrbracket\_{\eta}(\theta,\mathbf{s})] \xrightarrow{unif.} \mathbb{E}\_{\mathbf{s}\sim\mathcal{D}}[\llbracketM\rrbracket(\theta,\mathbf{s})] \qquad\qquad\text{as }\eta\searrow0 \text{ for }\theta\in\Theta$$

We finally note that `unif can be made more permissible by adding syntactic sugar for a-fold (for a ∈ N>0) addition a · M ≡ M + · · · + M and multiplication M<sup>a</sup> ≡ M · · · · · M. This admits more terms as guards, but safely [18].

# 6 Related Work

[23] is both the starting point for our work and the most natural source for comparison. They correct the (biased) reparameterisation gradient estimator for non-differentiable models by additional non-trivial boundary terms. They present an efficient method for affine guards only. Besides, they are not concerned with the convergence of gradient-based optimisation procedures; nor do they discuss how assumptions they make may be manifested in a programming language.

In the context of the reparameterisation gradient, [25] and [17] relax discrete random variables in a continuous way, effectively dealing with a specific class of discontinuous models. [39] use a similar smoothing for discontinuous optimisation but they do not consider a full programming language.

Motivated by guaranteeing absolute continuity (which is a necessary but not sufficient criterion for the correctness of e.g. variational inference), [24] use an approach similar to our trace types to track the samples which are drawn. They do not support standard conditionals but their "work-around" is also eager in the sense of combining the traces of both branches. Besides, they do not support a full higher-order language, in which higher-order terms can draw samples. Thus, they do not need to consider function types tracking the samples drawn during evaluation.

# 7 Empirical Evaluation

We evaluate our smoothed gradient estimator (Smooth) against the biased reparameterisation estimator (Reparam), the unbiased correction of it (LYY18) due to [23], and the unbiased (Score) estimator [31,38,27]. The experimental setup is based on that of [23]. The implementation is written in Python, using automatic differentiation (provided by the jax library) to implement each of the above estimators for an arbitrary probabilistic program. For each estimator and model, we used the Adam [19] optimiser for 10, 000 iterations using a learning rate of 0.001, with the exception of xornet for which we used 0.01. The initial model parameters θ<sup>0</sup> were fixed for each model across all runs. In each iteration, we used N = 16 Monte Carlo samples from the gradient estimator. For the Lyy18 estimator, a single subsample for the boundary term was used in each estimate. For our smoothed estimator we use accuracy coefficients η ∈ {0.1, 0.15, 0.2}. Further details are discussed in [18, Appendix E.1].

Compilation for First-Order Programs. All our benchmarks are first-order. We compile a potentially discontinuous program to a smooth program (parameterised by ση) using the compatible closure of

$$\text{if } L < 0 \text{ then } M \text{ else } N \leadsto (\lambda w. \sigma\_\eta(-w) \cdot M + \sigma\_\eta(w) \cdot N) \, L.$$

Note that the size only increases linearly and that we avoid of an exponential blow-up by using abstractions rather than duplicating the guard L.

Models. We include the models from [23], an example from differential privacy [11] and a neural network for which our main competitor, the estimator of [23], is not applicable (see [18, Appendix E.2] for more details).

Fig. 5: ELBO trajectories for each model. A single colour is used for each estimator and the accuracy coefficient η = 0.1, 0.15, 0.2 for Smooth is represented by dashed, solid and dotted lines respectively.

#### Analysis of Results

We plot the ELBO trajectories in Fig. 5 and include data on the computational cost and work-normalised variance [8] in [18, Table 2]. (Variances can be improved in a routine fashion by e.g. taking more samples.)

The ELBO graph for the temperature model in Fig. 5a and the cheating model in Fig. 5d shows that the Reparam estimator is biased, converging to suboptimal values when compared to the Smooth and Lyy18 estimators. For temperature we can also see from the graph and the data in [18, Table 2a] that the Score estimator exhibits extremely high variance, and does not converge.

Finally, the xornet model shows the difficulty of training step-function based neural nets. The Lyy18 estimator is not applicable here since there are non-affine conditionals. In Fig. 5e, the Reparam estimator makes no progress while other estimators manage to converge to close to 0 ELBO, showing that they learn a network that correctly classifies all points. In particular, the Smooth estimator converges the quickest.

Summa summarum, the results reveal where the Reparam estimator is biased and that the Smooth estimator does not have the same limitation. Where the Lyy18 estimator is defined, they converge to roughly the same objective value. Our smoothing approach is generalisable to more complex models such as neural networks with non-linear boundaries, as well as simpler and cheaper (there is no need to compute a correction term). Besides, our estimator has consistently significantly lower work-normalised variance, up to 3 orders of magnitude.

# 8 Conclusion and Future Directions

We have discussed a simple probabilistic programming language to formalise an optimisation problem arising e.g. in variational inference for probabilistic programming. We have endowed our language with a denotational (measurable) value semantics and a smoothed approximation of potentially discontinuous programs, which is parameterised by an accuracy coefficient. We have proposed type systems to guarantee pleasing properties in the context of the optimisation problem: For a fixed accuracy coefficient, stochastic gradient descent converges to stationary points even with the reparameterisation gradient (which is unbiased). Besides, the smoothed objective function converges uniformly to the true objective as the accuracy is improved.

Our type systems can be used to independently check these two properties to obtain partial theoretical guarantees even if one of the systems suffers from incompleteness. We also stress that SGD and the smoothed unbiased gradient estimator can even be applied to programs which are not typable.

Experiments with our prototype implementation confirm the benefits of reduced variance and unbiasedness. Compared to the unbiased correction of the reparameterised gradient estimator due to [23], our estimator has a similar convergence, but is simpler, faster, and attains orders of magnitude (2 to 3,000 x) reduction in work-normalised variance.

Future Directions. A natural avenue for future research is to make the language and type systems more complete, i.e. to support more well-behaved programs, in particular programs involving recursion.

Furthermore, the choice of accuracy coefficients leaves room for further investigations. We anticipate it could be fruitful not to fix an accuracy coefficient upfront but to gradually enhance it during the optimisation either via a predetermined schedule (dependent on structural properties of the program), or adaptively.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Type-safe Quantum Programming in Idris

Liliane-Joy Dandy1,2,<sup>3</sup> , Emmanuel Jeandel<sup>3</sup> , and Vladimir Zamdzhiev3,4()

<sup>1</sup> EPFL, Lausanne, Switzerland liliane-joy.dandy@epfl.ch <sup>2</sup> Ecole polytechnique, Palaiseau, France ´ <sup>3</sup> Universit´e de Lorraine, CNRS, Inria, LORIA, 54000 Nancy, France emmanuel.jeandel@loria.fr <sup>4</sup> Universit´e Paris-Saclay, CNRS, ENS Paris-Saclay, Inria, LMF, 91190, Gif-sur-Yvette, France vladimir.zamdzhiev@inria.fr

Abstract. Variational Quantum Algorithms are hybrid classical-quantum algorithms where classical and quantum computation work in tandem to solve computational problems. These algorithms create interesting challenges for the design of suitable programming languages. In this paper we introduce Qimaera, which is a set of libraries for the Idris 2 programming language that enable the programmer to implement hybrid classical-quantum algorithms where the full power of the elegant Idris language works in synchrony with quantum programming primitives. The two key ingredients of Idris that make this possible are (1) dependent types which allow us to implement unitary quantum operations; and (2) linearity which allows us to enforce fine-grained control over the execution of quantum operations so that we may detect and reject many physically inadmissible programs. We also show that Qimaera is suitable for variational quantum programming by providing implementations of two prominent variational quantum algorithms – QAOA and VQE.

# 1 Introduction

Variational Quantum Algorithms [30,25,13] present a computational paradigm where hybrid classical-quantum algorithms work in tandem to solve computational problems. The classical part of the algorithm is performed by a classical processor and the quantum part of the algorithm is executed on a quantum device. During the computation process, intermediary results produced by the quantum device are passed onto the classical device which performs further computation on them that is used to tune the parameters of the quantum part of the algorithm, which therefore has an effect on the quantum dynamics. The hybrid classical-quantum back and forth process repeats until a desired termination condition is satisfied.

This hybrid classical-quantum computational paradigm opens up interesting and important challenges for the design of suitable programming languages. It is clear that if we wish to program within such computational scenarios, we

Source code for Qimaera [1] and a full version of the paper [12] are available.

need to develop a language that correctly models the manipulation of quantum resources. In particular, quantum measurements give rise to probabilistic computational effects that are inherited by the classical side of the language. Another issue is that quantum information behaves very differently compared to classical information. As an example, quantum information cannot be copied in a uniform way [36], unlike classical information, which may be freely copied without restriction. Therefore, if we wish to avoid runtime errors, the quantum fragment of the language needs to be equipped with features for fine-grained control, such as for example, having a substructural typing discipline [16,8,7,24,6] where contraction (i.e., copying) is restricted. On the other hand, when doing classical computation, such restrictions are unnecessary and often inconvenient. One solution to this problem is to design a language with a classical (non-linear) fragment together with a quantum (linear) one, both of which interact nicely with each other. In fact, this can be achieved within an existing language that has a sufficiently advanced type system, as we show in this paper.

In this paper, we describe Qimaera (named after the hybrid creature Chimaera from Greek mythology), which is a set of libraries for the Idris 2 language [10] that allow the programmer to implement hybrid quantum-classical algorithms in a type-safe way. Idris 2 is an elegant functional programming language that is equipped with an advanced type system based on Quantitative Type Theory [24,6] that brings many useful features to the programmer, most notably dependent types and linearity. These two features of Idris are crucial for the development of Qimaera and, in fact, are the reason we chose Idris in the first place. Dependent types are used throughout our entire development in order to correctly represent and formalise the compositional nature of quantum operations. Linearity is used in order to enforce the proper consumption of quantum resources (during execution) in a way that is admissible with respect to the laws of quantum mechanics. The combination of dependent types and linearity allows us to statically detect and reject erroneous quantum programs and this ensures the type safety of our approach to variational quantum programming.

In our intended computational scenario, we have access to both a classical computer and a quantum computer. Since we cannot directly observe quantum information, we directly interact with the classical computer which sends instructions to, and receives data from, the quantum device via a suitable interface that makes use of the IO monad. In our view, this is a representation of a (perhaps simple) computational environment for hybrid quantum-classical programming. We design a suitable (abstract) interface that allows us to model this situation accurately and which makes use of the IO monad. However, since the authors do not personally have any quantum hardware, we provide only one concrete implementation of our interface that simulates the relevant quantum operations on our classical computers by using the proper linear-algebraic formalism, but while still using the IO monad as prescribed by the abstract interface. From a high-level programming perspective, the abstract interface addresses the programming challenges induced by the classical-quantum device scenario, but it ignores lower-level considerations (e.g., error correction).

We emphasise that we can achieve type-safe hybrid quantum-classical programming in an existing programming language by implementing suitable libraries. This is important for variational quantum programming, because in most variational quantum algorithms, the classical part of the algorithm is considerably larger, more complicated and more difficult to implement, compared to the quantum part of the algorithm. Therefore, it is important for the programming language to have first-class support for classical programming features. We think our chosen language, Idris, is such a language. The advanced type system of Idris allows us to elegantly mix quantum and classical programming primitives and therefore allows us to achieve our objectives. We demonstrate that Qimaera is suitable for variational quantum programming by providing implementations of the two most prominent variational quantum algorithms – QAOA and VQE. Moreover, our implementation of these algorithms has been achieved in a type-safe programming framework. By this we mean that common quantum programming errors (copying of qubits, applying a CNOT operation with the same source and target, etc.) are statically detected and rejected by the Idris type checker. We also note that being able to combine quantum and classical programming is important in other scenarios too (for instance in quantum cryptography).

Quantum Circuits vs Recursive Quantum Programs. We want to stress that the focus of our paper is not about quantum circuits, but about (recursive) quantum programs and algorithms. While some quantum algorithms may be seen as quantum circuits, there are algorithms which are more general, for example, repeat-until-success (see §5.2) and variational quantum algorithms (see §6). Such algorithms are not quantum circuits in the traditional understanding of this notion, and for them general recursion, probabilistic effects and classical computation might be important.

More specifically, general recursion is important, because many existing quantum algorithms are probabilistic and find the correct answer with some probability. General recursion then allows the programmer to repeatedly run such an algorithm until the correct solution is found, thereby resulting in an almostsurely-terminating program, i.e., a program that terminates with probability 1. However, since there is no upper bound on the number of runs of the algorithm, general recursion is necessary to express this pattern. For instance, this can be used to repeatedly run Shor's algorithm until the algorithm succeeds in finding a divisor. This might also be useful for variational quantum algorithms, because it allows us to express more flexible termination conditions, which give us more than simple iterations.

Safety Properties. We consider type safety in quantum programming to be important, because it is easy to make mistakes where one can copy qubits or forget to use a qubit. The former is physically inadmissible due to the no-cloning theorem of quantum mechanics [36] and the latter usually leads to unexpected behaviour, because discarding quantum information causes a side effect that may affect the rest of the quantum system. These observations suggest that we may design our systems and libraries carefully, by utilising linear typing features, so that these situations can be statically detected and rejected by the type system, therefore avoiding the problem. Otherwise, such situations could result in runtime errors (e.g., copying a qubit), which are clearly undesirable. In fact, in our experience, it is very easy to make such mistakes and this happened while we were implementing some of the quantum algorithms described in this paper. Our type-safe approach to quantum programming automatically detects and rejects these kinds of erroneous programs during type checking. While we do not have any proof of correctness, we believe that our approach is type-safe as long as the users do not modify our library files.

Why Idris instead of another language? The features that we require to achieve our objectives are: general recursion, dependent types and linearity. We chose Idris 2, because it is an excellent language that has all three of these features. Removing general recursion limits the expressivity of the language (as explained above). The other two features are used to reject erroneous quantum programs. We think that most programming languages that have the three features mentioned above are suitable for type-safe hybrid quantum-classical programming. In fact, one of the main points that we wish to demonstrate with this paper is that it is not necessary to build a standalone programming language in order to achieve the desired safety properties. Instead, the same can be achieved with already existing languages, such as Idris 2. This approach has some advantages (compared to designing a standalone language), such as: easier maintenance, larger library support, better integration with the newest developments in classical programming, etc.

# 2 Background on Quantum Computation

Readers interested in a detailed introduction to quantum computing may consult [26]. In this section we summarise the basic notions that are relevant for our development.

The simplest non-trivial quantum system is the quantum bit, often abbreviated as qubit. Qubits may be thought of as the quantum counterparts of the bit from classical computation. A qubit |ψi is represented as a normalised vector in C 2 . The computational basis is given by the pair of vectors |0i def = 1 0 and |1i def = 0 1 , which may be seen as representing the classical bits 0 and 1. An arbitrary qubit is described by <sup>|</sup>ψ<sup>i</sup> <sup>=</sup> <sup>a</sup> <sup>|</sup>0i+<sup>b</sup> <sup>|</sup>1<sup>i</sup> where a, b <sup>∈</sup> <sup>C</sup> and <sup>|</sup>a<sup>|</sup> <sup>2</sup>+|b<sup>|</sup> <sup>2</sup> = 1.

A qubit may be in (uncountably) many different states, whereas a classical bit is either 0 or 1. When the linear combination |ψi = a |0i+b |1i is non-trivial, then we say that |ψi is in superposition of |0i and |1i. Superposition is a very important quantum resource which is used by many quantum algorithms.

Fig. 1. The Hadamard, Phase Shift, CNOT and CU gates.

The state space that describes a system of n qubits is the Hilbert space C 2 n . If |ψi and |φi are two states of n and m qubits respectively, then the composite n + m qubit state |ψφi def <sup>=</sup> <sup>|</sup>ψi ⊗ |φ<sup>i</sup> is described by the Kronecker product <sup>⊗</sup> of the original states.

A quantum state <sup>|</sup>ψi ∈ <sup>C</sup> 2 n may undergo a unitary evolution described by a unitary matrix <sup>U</sup> <sup>∈</sup> <sup>C</sup> 2 <sup>n</sup>×2 n in which case the new state of the system is described by the vector U |ψi. Unitary operations (and matrices) are closed under sequential composition (described by matrix multiplication ◦) and under parallel composition (described by Kronecker product ⊗ ). Sequential composition of unitary operations is used to describe the temporal evolution of quantum systems, whereas the parallel composition is used to describe their spatial structure.

The unitary quantum operations are also often called unitary gates. One typically chooses a universal gate set which is a small set of unitary operations that suffices to express all other unitary operations via (parallel and sequential) composition. The universal gate set that we choose for our development is standard and we specify these unitary operations next by giving their action on the computational basis (which uniquely determines the operations).

The Hadamard Gate, denoted H, is the 1-qubit unitary map whose action on the computational basis is given by H |0i = <sup>√</sup> 1 2 (|0i+|1i) and H |1i = <sup>√</sup> 1 2 (|0i−|1i) and its primary purpose is to generate superposition. The Phase Shift Gate, denoted <sup>P</sup>(α), for <sup>α</sup> <sup>∈</sup> <sup>R</sup>, is a 1-qubit unitary map whose action on the computational basis is given by: P(α)|0i = |0i and P(α)|1i = e iα <sup>|</sup>1<sup>i</sup> and its primary purpose is to modify the phase of a quantum state. The family of Phase Shift Gates is parameterised by the choice of <sup>α</sup> <sup>∈</sup> <sup>R</sup> and important special cases include the unitary gates T def = P(π/4) and Z def = P(π). The Controlled-Not Gate (CNOT), is a 2-qubit unitary map whose action on the computational basis is given by CNOT |00i = |00i; CNOT |01i = |01i; CNOT |10i = |11i and CNOT |11i = |10i and this unitary map may be used to generate quantum entanglement.

Unitary gates admit a diagrammatic representation as quantum circuits. The atomic unitary gates we described above are shown in Figure 1. Composite unitary gates may also be described as circuits (see Figure 2): sequential composition amounts to plugging wires of subdiagrams and parallel composition amounts to juxtaposition.

The CNOT gate is the simplest example of a controlled unitary gate. Given a unitary gate U : C 2 n <sup>→</sup> <sup>C</sup> 2 n , the controlled-U unitary gate is the unitary gate CU : C 2 n+1 <sup>→</sup> <sup>C</sup> 2 <sup>n</sup>+1 whose action is determined by the assignments CU(|0i ⊗ |ψi) = |0i⊗|ψi and CU(|1i⊗|ψi) = |1i⊗(U |ψi). Controlled unitary operations are ubiquitous in quantum computing (see Figure 1 for their circuit depiction).

Fig. 2. A quantum circuit that may be used for the preparation of the Bell state.

Every unitary operation U is reversible with the inverse operation given by the conjugate transpose, denoted U † , which is again a unitary matrix. Applying the inverse operation (i.e., the adjoint) of a given unitary map is ubiquitous.

A quantum state <sup>|</sup>ψi ∈ <sup>C</sup> 2 n , with n > 1, is said to be entangled when there exists no non-trivial decomposition |ψi = |φi ⊗ |τ i. Quantum entanglement is a very important resource in quantum computation which is exhibited by many quantum algorithms. Because of the possibility of entanglement, we cannot, in general, break down quantum systems into smaller components and we are often forced to reason about such systems in their entirety. A very important example of an entangled state is the Bell state given by |Belli def = |00i+|11i √ 2 .

Preparing a new qubit in state |0i is an admissible physical operation. This, together with application of unitary gates as part of the computation, allows us to prepare arbitrary quantum states, e.g., the Bell state can be prepared by taking |Belli = (CNOT ◦ (H ⊗ I))|00i (see Figure 2).

Quantum information cannot be directly observed without affecting the state of the underlying system. In order to extract information from quantum systems, we need to perform a quantum measurement on (parts of) our systems. For example, when performing a quantum measurement on a qubit in the state |ψi = a |0i + b |1i, there are two possible outcomes: either the quantum system will collapse to state |0i and we obtain the classical bit 0 as evidence of this event, or, the quantum system will collapse to state |1i and we obtain the classical bit 1 as evidence of this event. The first outcome (corresponding to bit 0) occurs with probability |a| <sup>2</sup> and the second outcome (corresponding to bit 1) occurs with probability 1−|a| <sup>2</sup> <sup>=</sup> <sup>|</sup>b<sup>|</sup> 2 . In general, when we measure n qubits simultaneously, we obtain a bit string of length n which determines the event that occurred and the quantum system collapses to a corresponding state with some probability, both of which are determined via the Born rule of quantum mechanics. Therefore, quantum measurements induce evolutions which are probabilistic and irreversible (or destructive), which distinguishes them from unitary evolutions, which are deterministic and reversible.

Unlike classical information, quantum information cannot be uniformly copied. This is made precise by the no-cloning theorem [36]. There exists no unitary operation U : C <sup>4</sup> <sup>→</sup> <sup>C</sup> 4 , such that for every qubit |ψi : U(|ψi ⊗ |0i) = |ψi ⊗ |ψi. This means that copying of quantum information is a physically inadmissible operation. Ideally, quantum programming languages should be designed so that these kinds of errors are detected during type checking.

# 3 Background on the Idris 2 Language

In this section, we give a short overview of the Idris 2 language and its main features that are relevant for the development of Qimaera. Idris 2 is a functional language with a syntax influenced by that of Haskell. The features of particular interest for us are dependent types and linearity, both of which are crucial for Qimaera. Its type system is based on Quantitative Type Theory [24,6], which specifies how dependent types and linearity are combined.

Dependent Types. In Idris, types are first-class primitives and they may be manipulated like other constructs of the language. This allows us to formulate more expressive types that can depend on values, and hence it enables us to make some properties and program invariants explicit.

Example 1. The type of vectors is a simple and useful example of a dependent type. A vector is a list with a fixed length that is part of the type. It can be defined as follows, where S is the successor function for natural numbers, and a is a polymorphic type:

```
data Vect : Nat -> Type -> Type where
  Nil : Vect 0 a
  (::) : a -> Vect k a -> Vect ( S k ) a
```
The type Vect has two constructors (i.e., introduction rules). The first one constructs the empty vector, of length zero. The second one is used to introduce non-empty vectors: a vector with k+1 elements of type a is constructed by combining an element of type a and a vector of size k.

Type dependency allows us to specify useful program properties and type checking ensures that they hold. For instance, we can define an append function that concatenates two vectors. Then, the size of the output vector is the sum of the sizes of the input vectors and this is specified by its type.

append : Vect n a -> Vect m a -> Vect ( n + m ) a

This information allows the language to detect a larger class of programming errors. Note that type dependency information is not available for the analogous function on lists. Type dependency may also be used to express constraints on the inputs of a function, e.g., we can define a total function, called pop, that cannot be applied to an empty vector.

```
pop : Vect ( S k ) a -> Vect k a
pop ( x :: xs ) = xs
```
Writing "pop []" is now an error which is detected statically, rather than dynamically, and we note that the same cannot be achieved if we were to replace vectors with lists.

Linearity. The type system of Idris 2 is based on Quantitative Type Theory, where every function argument is associated with a multiplicity that states the number of times the variable is used at runtime<sup>5</sup> . This multiplicity can be 0, 1 or ω. An argument with multiplicity 0 is only used at compile time (to determine type dependency information) and is erased at runtime. A linear argument has multiplicity 1 and it is used exactly once at runtime. Finally, ω represents the unrestricted multiplicity, which is default, where the function argument may be used any number of times.

Example 2. Consider the pop function which we just discussed. The (implicitly bound) variables k and a have multiplicity 0, because they are not explicitly specified as separate arguments, and they are not accessible at runtime in the function. The variables x and xs, which are explicitly bound, have the default (unrestricted) multiplicity.

Example 3. An important type which we define in Qimaera is the type of linear vectors, which we write as LVect. The only difference, compared to the standard vectors in Idris, is that the (::) constructor for LVect is a linear function in all of its arguments. Linearity in Idris 2 is specified by writing the multiplicity 1 in front of each argument.

```
data LVect : Nat -> Type -> Type where
  Nil : LVect 0 a
  (::) : (1 _ : a ) -> (1 _ : LVect k a ) ->
         LVect ( S k ) a
```
We also use linear pairs that are already defined in Idris 2.

data LPair : Type -> Type -> Type (#) : (1 \_ : a ) -> (1 \_ : b ) -> LPair a b

Linearity allows us to specify and enforce constraints on function arguments, e.g., it prevents us from duplicating data, so the function definition below leads to an error:

```
copy : (1 _ : a ) -> LPair a a
copy x = x # x
Error : While processing right hand side of
copy . There are 2 uses of linear name x .
```
Linearity is prominently used in Qimaera. In particular, when manipulating quantum data, linearity is enforced in order to properly handle quantum resources and comply with the laws of quantum mechanics.

Remark 1. We learned only recently that there is a type of linear vectors in the Idris libraries. In the future we might replace our implementation with the one provided by the Idris developers.

<sup>5</sup> This can be understood similarly to how variables are used in linear λ-calculi.

```
data Unitary : Nat -> Type where
  IdGate : Unitary n
  H : ( j : Nat ) ->
           { auto prf : ( j < n ) = True } ->
            Unitary n -> Unitary n
  P : ( p : Double ) -> ( j : Nat ) ->
           { auto prf : ( j < n ) = True } ->
            Unitary n -> Unitary n
  CNOT : ( c : Nat ) -> ( t : Nat ) ->
           { auto prf1 : ( c < n ) = True } ->
           { auto prf2 : ( t < n ) = True } ->
           { auto prf3 : ( c /= t ) = True } ->
            Unitary n -> Unitary n
```
Fig. 3. The Unitary data type (file: Unitary.idr).

# 4 Unitary Operations in Qimaera

We describe our representation of unitary transformations in Qimaera as an algebraic data type called Unitary. Every value of this type is, by design, an algebraic decomposition of a unitary operation in terms of the atomic unitary gates that we selected in §2.

The Unitary data type allows us to adopt a high-level algebraic and scalable approach towards the reversible fragment of quantum computation. This provides the programmer with some benefits as we show in this section. However, using the Unitary data type is actually entirely optional. Users who are interested in effectful quantum programming do not have to use it (see §5) and they may still do hybrid classical-quantum programming, but at the cost of losing the algebraic decomposition of unitary operations. However, there are many useful functions that are available for manipulating values of type Unitary that are not available for effectful quantum programs.

#### 4.1 The Unitary Data Type

Quantum unitary operations admit an algebraic representation based on the atomic gates from the universal gate set we described. Our idea for the representation of unitary operations is based on this, or equivalently, on how unitary operations may be expressed in terms of unitary quantum circuit diagrams. Because of these reasons, linearity is not required for our formalisation of unitary operations. The code for the Unitary data type is listed in Figure 3 and we now describe our representation in greater detail.

Given a natural number n : Nat, the type of unitary operations on n qubits is given by Unitary n. Note that Unitary is an algebraic data type with a simple type dependency on the arity of the desired operation. The Unitary type has four different introduction rules which we describe next.

The first constructor, IdGate, represents the identity unitary operation on n qubits. Diagramatically, we can see this as constructing a circuit of n wires, without applying any other gates on any of the wires. It has a unique argument, n, which is implicit – it can be omitted when calling the IdGate constructor and it will often be inferred by Idris.

The second constructor, H, should be understood as applying the Hadamard gate H to the j-th qubit of some previously constructed unitary circuit which is specified as the last argument. The first implicit argument, n, is simply the arity of the resulting unitary operation. The second implicit argument, prf, is a proof obligation that j is smaller than n. This ensures that the argument j identifies an existing wire of the previously constructed unitary circuit (last argument) and therefore the overall definition is algebraically and physically sound. We think that the implicit argument prf may be removed from our implementation if we change the type of j to Fin n, the type of natural numbers less than n. However, in our experience, we found it easier to work with the current implementation rather than with Fin and for this reason we chose to keep the prf argument.

The third constructor, P, should be viewed as applying the P(p) gate, where the real number <sup>p</sup> <sup>∈</sup> <sup>R</sup> is approximated by the term p : Double. <sup>6</sup> The remaining arguments serve the same purpose as those for H.

The final constructor, CNOT, should be understood as applying the CNOT gate, where c identifies the wire used for the control (the small black dot in Figure 1), t identifies the wire of the target (the crossed circle in Figure 1) and the last (unnamed) argument is the previously constructed unitary circuit on which we are applying CNOT. The remaining arguments are implicit: the argument n is the arity of the unitary; prf1 and prf2 ensure that c and t identify valid wires of the unitary circuit; prf3 ensures that the control and target wires are distinct and therefore the overall application of CNOT is physically and algebraically admissible.

In our representation of quantum unitary operations, we make use of type dependency to impose proof obligations on some of our constructors in order to guarantee that the representation makes sense in physical and algebraic terms. Indeed, this might sometimes be a burden for the users of the library. However, Idris can sometimes automatically infer the required proofs without any assistance from the user, e.g., when all arguments are statically known constants (see Example 4). This is discussed in detail in the next subsection.

#### 4.2 Constructing Unitary Transformations

The four basic introduction rules of the Unitary type allow us to define highlevel functions in Idris that can be used to construct complex unitary circuits out of simpler ones. We discuss this here and we show that the proof obligations

<sup>6</sup> This approximation is not a big limitation – in fault-tolerant quantum computing one usually replaces the P(p) gate family with a single T = P(π/4) gate and the resulting gate set suffices to achieve approximation with arbitrary precision. So we can easily replace P with a T constructor.

from Figure 3 can sometimes be ameliorated and sometimes even completely sidestepped.

First, we point out that auto-implicit arguments may occasionally be inferred by Idris via suitable search. For example, if all the arguments are known statically, the required proofs will often be discovered by Idris and then the users do not have to manually provide them.

Example 4. The unitary circuit from Figure 2 may be constructed in the following way:

```
toBellBasis : Unitary 2
toBellBasis = CNOT 0 1 ( H 0 IdGate )
```
In this example, Idris is able to infer all the implicit arguments and there is no need to provide any proofs. If we do not satisfy one of the constraints, e.g., if we write CNOT 1 1 above (which does not make physical sense), then we get the following error during type checking:

```
Error : While processing right hand side of
toBellBasis . Can ' t find an implementation for
not (== 1 1) = True .
```
An error also is reported if we provide a wire number larger than 1. It also is useful to define standalone unitary gates for the H, P(r) and CNOT gates as follows:

```
HGate : Unitary 1
HGate = H 0 IdGate
PGate : Double -> Unitary 1
PGate r = P r 0 IdGate
CNOTGate : Unitary 2
CNOTGate = CNOT 0 1 IdGate
```
Composing Unitary Circuits. Our libraries provide functions for sequential composition (compose) and parallel composition (tensor) of unitary operations:

```
compose : Unitary n -> Unitary n -> Unitary n
tensor : {n : Nat } -> { p : Nat } -> Unitary n
                    -> Unitary p -> Unitary ( n + p )
```
Notice that both functions do not require proof obligations like the ones from Figure 3. This means that one of the main algebraic ways for composing unitary operations may be done without requiring such proofs. The use of these functions is ubiquitous in practice and we introduce the infix synonyms (.) and (#) for compose and tensor, respectively.

Example 5. The toBellBasis gate from Example 4 may be equivalently expressed in the following way:

```
toBellBasis : Unitary 2
toBellBasis = CNOTGate . ( HGate # IdGate )
```
Qimaera provides another, more general, form of composition via the function apply whose type is as follows:

```
apply : { i : Nat } -> { n : Nat } ->
        Unitary i -> Unitary n ->
        ( v : Vect i Nat ) ->
        { auto _ : isInjective n v = True } ->
        Unitary n
```
The apply function is used to apply a smaller unitary circuit of size i to a bigger one of size n, giving the vector v of wire indices on which we wish to apply the smaller circuit. It needs one auto-implicit proof which enforces the consistency requirement that all indices of the wires specified by v are pairwise distinct and smaller than n. In fact, the apply function implements the most general notion of composition that we support. Both sequential and parallel composition can be realised as special cases using it. The importance of the vector v is that it determines how to apply the smaller unitary circuit of arity i to any selection of i wires of the larger unitary circuit, and moreover, it also allows us to permute the inputs/outputs of the smaller unitary circuit while doing so. More specifically, if the k-th entry of the vector v is the natural number p, then the k-th input/output of the smaller unitary circuit will be applied to the p-th wire of the larger unitary circuit. This is best understood by example.

Example 6. Consider the following code sample:

```
U : Unitary 3
U = HGate # IdGate { n = 1} # ( PGate pi )
apply_example : Unitary 3
apply_example = apply toBellBasis U v
```
where v is a vector of length two. Here, toBellBasis is given in Example 4 and represents the circuit given below left; U represents the circuit given below right:

Table 1 shows what unitary circuit is specified under different values of v. In these cases, Idris can automatically infer the required proofs and the user does not have to provide them.

Remark 2. Instead of using apply, there is another possible approach, in the spirit of symmetric monoidal categories [23, §XI], where we could add one extra introduction rule to the Unitary type for representing permutations of wires. However, in our view, this approach is less appealing, because one does not usually think of permutations (induced by the symmetric monoidal structure) as physical gates.

Table 1. Examples illustrating the apply function.

Adjoints of Unitary Circuits. Qimaera also provides a function

adjoint : Unitary n -> Unitary n

which computes the adjoint (i.e., inverse) of a given unitary circuit. One often has to apply the inverse of a given unitary circuit, so having a method such as this one is useful. Our implementation uses the standard approach for synthesising the adjoint. The adjoint may be used, for example, to uncompute the result of the application of unitary gates on auxiliary qubits.

Controlled Unitary Circuits. We also implement a function

```
controlled : { n : Nat } -> Unitary n -> Unitary ( S n )
```
which given a unitary circuit U constructs the corresponding controlled unitary circuit CU. Our implementation uses the standard and simple algorithm for doing this, but more efficient algorithms may also be implemented in principle.

Analysis of Unitary Circuits. Unitary circuits are represented in a scalable way in Qimaera and we can use Idris to optimise them. In particular, the function:

```
optimise : Unitary n -> Unitary n
```
may be used to optimise a given unitary circuit by reducing the number of gates while keeping the action of the circuit unchanged. So far, this function provides only very basic optimisations, but more sophisticated and powerful ones may be added in principle. The point we wish to make is that unitary circuits in Qimaera may be analysed and manipulated like other algebraic data

Fig. 4. The QFT unitary circuit on n qubits.

type structures using the capabilities of Idris. In fact, the file Unitary.idr also provides other functions that do this. For example, we provide functions for calculating the circuit depth, calculating the number of specific atomic gates used by a circuit, drawing circuits in the terminal and exporting circuits to Qiskit so that users may then use external analysis tools.

#### 4.3 Example: The Quantum Fourier Transform

The Quantum Fourier Transform (QFT) is an important unitary operator that is used in Shor's polynomial-time algorithm for integer factorisation [34]. The unitary circuit which realises QFT on n qubits is shown in Figure 4, where R<sup>n</sup> def = P 2π 2<sup>n</sup> . The Qimaera code which implements this unitary circuit is shown in Figure 5. Notice that we make use of the controlled function from §4.2 in the function cRm, so that we can implement the controlled R<sup>n</sup> gates that are required. In this example, we have parameters that are universally quantified, so we need a few proofs in the code: one for using the apply function and one for correctly unifying the size of the circuit. These proof obligations appear when writing the qftRec function and Idris did not infer them automatically, so we had to provide the proofs. To get some intuition for the code: the qftRec function computes the recursive pattern that applies a Hadamard gate followed by the cascade of controlled R<sup>n</sup> gates; the qft function then computes the other recursive pattern which consists in repeatedly using the pattern computed by qftRec and composing as appropriate.

# 5 Effectful Quantum Computation

In the previous section we showed how unitary circuits can be represented in Qimaera. This suffices to capture the pure, deterministic and reversible fragment of quantum computation. However, we need to also consider effectful and probabilistic quantum processes which may result from quantum measurements, because this is important for hybrid quantum-classical computation. In this section, we show how this can be done in a type-safe way by using monads, linearity and dependent types.

```
Rm : Nat -> Unitary 1
Rm m = PGate (2 * pi / ( pow 2 ( cast m )))
cRm : Nat -> Unitary 2
cRm m = controlled ( Rm m )
qftRec : ( n : Nat ) -> Unitary n
qftRec 0 = IdGate
qftRec 1 = HGate
qftRec ( S ( S k )) =
  let t = ( qftRec ( S k )) # IdGate
  in rewrite sym $ lemmaplusOneRight k
  in apply ( cRm ( S ( S k ))) t [ S k ,0]
            { prf = lemmaInj1 k }
qft : ( n : Nat ) -> Unitary n
qft 0 = IdGate
qft ( S k ) =
  let g = qftRec ( S k )
      h = ( IdGate { n = 1}) # ( qft k )
  in h . g
```
Fig. 5. Qimaera code for QFT (file: QFT.idr).

#### 5.1 Representation of Quantum Effects in Qimaera

We now explain how the quantum program dynamics are represented in Qimaera in a type-safe way. We are (roughly) inspired by representing the notion of a quantum configuration as it appears in [32,29,22], which is in turn used to formally describe the operational semantics of quantum type systems.

Qubits in Qimaera. Because of the possibility of quantum entanglement, we cannot describe the state of an individual qubit which is part of a larger composite system. On the other hand, we wish to be able to refer to parts of the whole system by identifying specific qubit positions. In Qimaera, we introduce the following type declaration:

data Qubit : Type where MkQubit : ( n : Nat ) -> Qubit

The argument of type Nat is used as a unique identifier for the constructed qubit. The constructor MkQubit is private and users of our libraries cannot access it (outside of the library file). Instead, our libraries provide functions (Figure 7) that ensure that a term of type Qubit is created with a fresh (i.e., unique) natural number that serves as its identifier within a monadic environment. This is handled by our functions through careful manipulation of the available data within the monadic environment. In fact, these functions are the expected way

for our users to access or manipulate qubits and, moreover, our users cannot access the unique identifiers (unless they modify our libraries). This allows us to formulate a representation where values of type Qubit unambiguously refer to the relevant parts of larger composite systems. Therefore, a value of type Qubit should be understood as a pointer, or as a unique identifier, of a 1-qubit subsystem of some larger quantum state. Terms of type Qubit do not carry any sort of linear-algebraic information.

Probabilistic Effects. Quantum measurements induce probabilistic computational effects which are inherited by the classical side of the computation in hybrid classical-quantum algorithms. Furthermore, in our intended computational scenario, the classical computer (on which Idris is running) sends instructions to, and receives data from, the quantum device. In order to correctly model all of this, it is clear that we have to use the IO monad in order to encapsulate these effects. However, when representing quantum program dynamics, we also need to enforce linearity, but all the functions provided by the IO monad (e.g., pure which introduces pure values to monadic types) are not linear in any of their arguments. This creates a problem which may be solved by using the LIO library, which extends the IO monad with linearity. For brevity, we define R to be our linear IO monad:

R : Type -> Type R = L IO { use = Linear }

Then, by using R we can combine IO effects (and thus also probabilistic effects) and linearity in a suitable way.

Quantum State Transformer. Quantum computation is effectful, and moreover, quantum information cannot be observed by the classical computer (on which Idris is running): it only receives classical information through communication with the quantum device. Because of this, we adopt a more abstract view on the hybrid classical-quantum computational process. In order to do this, we define an (abstract) quantum state transformer by combining several different concepts: indexed state monads [4] 7 , linearity and IO (and thus also probabilistic) effects. Our representation of these ideas in Qimaera is shown in Figure 6, where we omit the function definitions for brevity.

The type QStateT is parameterised by a choice of three (arbitrary) types, so it is fairly abstract. Soon, we will see that it is very useful for our purposes. The intended interpretation of this type is the following: any value of type

#### QStateT initialType finalType returnType

represents a stateful (quantum) computation starting from a (quantum) state of type initialType and ending in a (quantum) state of type finalType which

<sup>7</sup> See [33] for a Haskell implementation of this idea.

```
data QStateT : Type -> Type -> Type -> Type where
  MkQST : (1 _ : (1 _ : initialType ) ->
          R ( LPair finalType returnType )) ->
          QStateT initialType finalType returnType
  runQStateT : (1 _ : initialType ) ->
             (1 _ : QStateT initialType finalType returnType ) ->
             R ( LPair finalType returnType )
  pure : (1 _ : a) -> QStateT t t a
  ( > >=) : (1 _ : QStateT i m a) ->
          (1 _ : ((1 _ : a) -> QStateT m o b )) ->
          QStateT i o b
```
Fig. 6. Quantum state transformer (file: QStateT.idr).

produces a user-accessible result of type returnType during the computation. For example, a value of type

QStateT (LPair Qubit Qubit) Qubit Bool

should be understood as a quantum process that transforms a two-qubit state into a single-qubit state and returns a single (classical) value of type Bool to the user. The functions presented in Figure 6 allow us to adopt a monadic programming discipline when working with QStateT and we do so henceforth. We remark that QStateT makes use of the monad R which encapsulates the IO (and probabilistic) effects and that linearity is enforced when working with QStateT.

Effectful Quantum Programming. The QStateT monad can be used to define a suitable abstract interface for quantum programming. In Figure 7, we present an excerpt of the QuantumOp interface which allows us to write quantum programs and execute them in a type-safe way. All of the hybrid quantumclassical algorithms we present are implemented using this interface.

The function newQubits is used to prepare p new qubits in state |0i and the function returns a linear vector of length p with the qubit identifiers of the newly created qubits. The function applyUnitary is used to apply a unitary operation of arity i to the qubits specified by the argument LVect (which also determines the order of application) and the operation returns an LVect which serves the same purpose – it identifies the qubits which were just modified by the unitary operator. The file QuantumOp.idr also provides functions applyH, applyP and applyCNOT which can be seen as special cases of applyUnitary. However, these three functions do not depend on the Unitary type.

The measure function is used to measure i qubits identified by the LVect argument and it returns a value of type Vect i Bool that represents the result of the measurement. After this, the i measured qubits are not reused, as one can see from the provided type information.

```
interface QuantumOp (0 t : Nat -> Type ) where
  newQubits : (p : Nat ) -> QStateT (t n) (t (n+p )) ( LVect p Qubit )
  newQubit : QStateT (t n) (t ( S n )) Qubit
  applyUnitary : {n : Nat } -> {i : Nat } -> (1 _ : LVect i Qubit ) ->
    Unitary i -> QStateT (t n) (t n) ( LVect i Qubit )
  applyH : {n : Nat } -> (1 _ : Qubit ) -> QStateT (t n) (t n) Qubit
  applyP : {n : Nat } -> Double -> (1 _ : Qubit ) ->
                         QStateT (t n) (t n) Qubit
  applyCNOT : {n : Nat } -> (1 _ : Qubit ) -> (1 _ : Qubit ) ->
    QStateT (t n) (t n) ( LPair Qubit Qubit )
  measure : {n : Nat } -> {i : Nat } -> (1 _ : LVect i Qubit ) ->
    QStateT (t (i + n )) (t n) ( Vect i Bool )
  measureQubit : {n : Nat } -> (1 _ : Qubit ) ->
                                QStateT (t (S n )) (t n) Bool
  measureAll : {n : Nat } -> (1 _ : LVect n Qubit ) ->
    QStateT (t n) (t 0) ( Vect n Bool )
  run : QStateT ( t 0) (t 0) ( Vect n Bool ) -> IO ( Vect n Bool )
```
Fig. 7. The QuantumOp interface (file: QuantumOp.idr).

Finally, the function run is used to execute quantum algorithms on the quantum device and obtain the classical information returned from it. Notice that run can be used to execute effectful quantum processes which start from the trivial quantum state (on zero qubits) and which terminate in the same trivial quantum state, but which also produce some number of classical bits as a user-accessible return result. This may be used to run quantum algorithms: in a typical situation, we start with the trivial quantum state (on zero qubits), we prepare n qubits in state |0i, we apply some unitary operations on them, and we finally measure all the qubits, thereby producing n bits of classical information. This quantum algorithm may then be represented as a value of type QStateT (t 0) (t 0) (Vect n Bool). Running it, however, produces a classical value of type IO (Vect n Bool), because the execution is probabilistic and because our classical computer (on which we are running Idris) has to perform IO actions to communicate with the quantum device.

In fact, all of the above operations modify the quantum state on the quantum device and may cause IO effects, because of the need to communicate with the quantum device. This is indeed reflected by our interface. Observe, that our interface is defined using the QStateT monad transformer which does incorporate IO effects (via the R monad we discussed previously).

Example 7. A fair coin toss may be implemented using quantum resources. The process is simple: (1) prepare the state |0i; (2) apply the H gate to it; (3) measure the qubit and return this as output. We implement this as follows:

```
coin : QuantumOp t = > IO Bool
coin = do
  [b] <- run ( do
            q <- newQubit {t = t}
            q <- applyH q
            r <- measure [q]
            pure r
         )
  pure b
```
The top-level do block simply realises monadic sequencing for the standard IO monad. The do block within the run environment is more interesting and crucial for our development. It performs monadic sequencing for the QStateT monad and it represents the simple three-step algorithm we just described. The call to the run function executes this algorithm and users obtain the produced classical information by storing it in the variable b of type Bool. We emphasise that linearity is enforced within the run environment and this is what brings safety properties in our approach, e.g., all of the following scenarios are statically detected and rejected by Idris: passing the qubit q to a non-linear function, copying the qubit q, forgetting to measure the qubit q. For example, if in the above code we replace the last two statements in the run environment with "pure True", then Idris statically detects this error.

The function coin from Example 7 is implemented using our abstract interface. This means we can use this function in any concrete implementation of the QuantumOp interface. Since the authors do not have any quantum hardware, we provide one concrete implementation of this interface, called SimulatedOp, which performs linear-algebraic simulation of all the required operations. For example, if we wish to use the coin function, then the code:

```
testCoin : IO Bool
testCoin = coin { t = SimulatedOp }
```
defines a new function, called testCoin, which does the same as coin, but it specifically instructs Idris to use linear-algebraic simulation. We emphasise that all of our quantum algorithms are written using our abstract interface, so there is no need to reimplement them for any additional concrete implementations of the interface.

#### 5.2 Example: Repeat-Until-Success Algorithm

Repeat-until-success (RUS) [27] is an algorithm for implementing quantum unitary operators by using quantum measurements and general unbounded recursion. The main advantage in using RUS over traditional deterministic techniques

```
RUS : QuantumOp t => (1 _ : Qubit ) ->
      (u ' : Unitary 2) -> (e : Unitary 1) ->
      QStateT (t 1) (t 1) Qubit
RUS q u ' e = do
  q ' <- newQubit
  [q ' ,q] <- applyUnitary [q ', q] u '
  b <- measureQubit q '
  if b then do
         [q] <- applyUnitary [q ] ( adjoint e)
         RUS q u ' e
       else pure q
example_u ' : Unitary 2
example_u ' = H 0 $ T 0 $ CNOT 0 1 $ H 0 $ CNOT 0 1 $ T 0 $
              H 0 IdGate
runRUS : QuantumOp t => IO Bool
runRUS = do
  [b] <- run ( do
               q <- newQubit {t = t}
               q <- RUS q example_u ' IdGate
               measure [q ]
         )
  pure b
testRUS : IO Bool
testRUS = runRUS {t = SimulatedOp }
```
Fig. 8. Repeat-until-success algorithm (file: RUS.idr).

that synthesise unitary operators, is that with RUS the expected number of T gates (which are expensive in terms of error correction<sup>8</sup> ) can be reduced.

In the simplest case, we wish to realise a fixed single-qubit unitary operator U : C <sup>2</sup> <sup>→</sup> <sup>C</sup> 2 . The RUS algorithm is as follows. Given an input qubit |ψi, then: (1) prepare a new qubit in state |0i; (2) apply a two-qubit unitary operator U 0 (chosen in advance depending on U); (3) measure the first qubit; (4) if the measurement outcome is 0 (which occurs with probability p > 0), then the output state is U |ψi, as required, and the algorithm terminates; otherwise the current state is E |ψi, where E is some other unitary operator (chosen in advance depending on U), so we apply E† to this state and we go back to step (1). The unitary operators U <sup>0</sup> and E are chosen in advance, depending on U, before the algorithm starts so that the above conditions are satisfied. Note that synthesising U <sup>0</sup> and E is not part of the algorithm and we do not discuss this here.

Assuming that appropriate U <sup>0</sup> and E are chosen, this process always terminates in state U |ψi (provided p > 0) so RUS indeed implements the unitary operator U. Note that this is an algorithmic realisation of U, not an algebraic one, and so we cannot write a program of type Unitary that achieves this. Instead, we represent this as a quantum program in Figure 8. There, RUS q u'

<sup>8</sup> We do not automatically implement error correction, so it has to be handled either by the developer or provided by the quantum device on the remote end.

e is the quantum state transformer which implements the RUS algorithm as above. The function runRUS simply executes the RUS algorithm on a qubit in state |0i, with the unitary operator chosen from [27, Figure 8], then measures the qubit and returns the outcome. Both of these functions are written using our abstract interface. The function testRUS is the same as runRUS, but it also instructs Idris to use linear-algebraic simulation for the execution. Note that, in our implementation, we have taken a specific instance of RUS by choosing U 0 to be the unitary operator described by example u' as discussed in [27, Figure 8].

Remark 3. The run(-) environment enforces linearity, so if we wish to use the RUS function within it, then the qubit argument must be linear in RUS.

# 6 Variational Quantum Programming

In the previous section we saw that Qimaera is suitable for writing recursive and effectful quantum programs that make use of quantum measurements. Moreover, Idris 2 is an excellent programming language with an advanced type system and first-class support for classical programming features. In order to demonstrate that Qimaera is suitable for hybrid classical-quantum programming, we also have to show that both classical and quantum programming features may be elegantly combined. This is the purpose of this section and we achieve this by implementing the two most prominent variational quantum algorithms: the Quantum Approximate Optimization Algorithm (QAOA) [13] and the Variational Quantum Eigensolver (VQE) [30]. In this paper we only describe QAOA. See the full paper [12] for more information on the implementation of VQE.

The objective of QAOA is to try to find the minimum (or maximum) eigenvalue of a Hamiltonian. A Hamiltonian is a Hermitian (i.e., self-adjoint) matrix H (we use a calligraphic font to differentiate it from H, the Hadamard matrix). Its minimum eigenvalue is the minimum (real) value λ such that H |ψi = λ |ψi for some nonzero vector |ψi. As H is unitarily diagonalizable, this is equivalent to the minimum of hψ| H |ψi for all vectors |ψi of norm 1, where hψ| def <sup>=</sup> <sup>|</sup>ψ<sup>i</sup> † .

QAOA starts with some assumption on what the vector |ψi looks like and usually |ψi is prepared by a quantum circuit that depends on some real parameters α1, . . . , αp. By measuring this state |ψi, one obtains some information on the value of hψ| H |ψi. This information can then be fed to a classical optimizer to change the value of the parameters α1, . . . , α<sup>p</sup> for subsequent execution.

This classical-quantum back and forth is repeated until some satisfactory termination condition has been satisfied. For example, we may simply repeat this process <sup>k</sup> times, where <sup>k</sup> <sup>∈</sup> <sup>N</sup> is some constant, but more sophisticated termination conditions are also possible. However, there is no guarantee that we will find the minimum eigenvalue.

Implementation of QAOA. QAOA is a variational algorithm [13] that approximately solves optimization problems. Let f : {0, 1} <sup>n</sup> <sup>→</sup> <sup>R</sup> be a function for which we want to find its minimum. We see f as a diagonal Hamiltonian over n qubits defined by H |xi = f(x)|xi for all x ∈ {0, 1} <sup>n</sup>. We are therefore searching for the minimum eigenvalue of this Hamiltonian.

In this case, the state |ψi that minimises the Hamiltonian H is often assumed to be of the form: |ψi = (HP(βp)H) <sup>⊗</sup>ne <sup>γ</sup>p<sup>H</sup> · · ·(HP(β1)H) <sup>⊗</sup>ne <sup>γ</sup>1HH⊗<sup>n</sup> <sup>|</sup>0i. The depth parameter <sup>p</sup> <sup>∈</sup> <sup>N</sup> is usually fixed to be small, and we have a guarantee that the results of our algorithm become better when p becomes larger. To be able to produce a circuit which computes |ψi, the Hamiltonian H may be assumed to have a special form so that we can make a circuit for e <sup>γ</sup>H. A well-known and important example is to compute the maximum cut of an undirected graph, i.e., to solve the MAXCUT problem.

Our implementation for QAOA on the MAXCUT problem is presented in the file QAOA.idr and an excerpt is shown in Figure 9. The problem depends on the graph G for which we want the maximum cut, a depth parameter p, and some real parameters β<sup>i</sup> , γ<sup>i</sup> .

In our implementation, we have a function QAOA Unitary, that takes these parameters as input and produces a unitary circuit that may be used to prepare the state |ψi when applied to the initial state |0i ⊗n . We then measure this state |ψi and present the result (a cut of the graph in the obvious binary encoding) to an optimiser. Our optimiser is implemented by the function classicalOptimisation that uses all observable information from all previous runs (which amounts to the values of the parameters β<sup>i</sup> , γ<sup>i</sup> and the value of the cuts that have been previously obtained through quantum measurements) to compute the subsequent rotation parameters β<sup>i</sup> , γ<sup>i</sup> that we will use for the next iteration. The type of this function indicates that it uses the IO monad: this is because we wish to allow the function to use probabilistic optimisation algorithms or even external tools. One of the simplest implementations of this function chooses the rotation parameters at random.

The interplay between the classical and the quantum part is presented in Figure 9. The function QAOA takes as input a natural number k representing how many times the whole routine will be done, the depth p of the circuit, and the graph G on which to compute the cut. Notice that the call to the quantum device is isolated inside the run function.

# 7 Related Work

In this section we compare Qimaera with other existing quantum programming languages that are implemented in software. We omit comparisons with quantum type systems that do not have a software implementation. We provide a feature comparison with some quantum programming languages in Table 2 and we now clarify the meaning of some of the selected features.

By Type Safety we mean that the language can statically detect (and reject) erroneous programs which duplicate quantum data. General Recursion is the ability to express recursive (possibly non-terminating) programs and almostsurely-terminating programs, such as RUS (see §5.2). Measurements is the ability to use the outcomes of quantum measurements in the control flow of programs.

```
QAOA_Unitary : {n : Nat } -> ( betas : Vect p Double )
                           -> ( gammas : Vect p Double )
                           -> ( graph : Graph n) -> Unitary n
classicalOptimisation : { p : Nat }
                       -> ( graph : Graph n)
                       -> ( previous_info : Vect k ( Vect p Double ,
                          Vect p Double , Cut n ))
                       -> IO ( Vect p Double , Vect p Double )
QAOA ' : QuantumOp t =>
        {n : Nat } ->
        (k : Nat ) -> (p : Nat ) -> ( graph : Graph n) ->
        IO ( Vect k ( Vect p Double , Vect p Double , Cut n ))
QAOA ' 0 p graph = pure []
QAOA ' (S k) p graph = do
  previous_info <- QAOA ' {t } k p graph
  ( betas , gammas ) <- classicalOptimisation graph previous_info
  let circuit = QAOA_Unitary betas gammas graph
  cut <- run ( do
               qs <- newQubits {t} n
               qs <- applyUnitary qs circuit
               measureAll qs
               )
  pure $ ( betas , gammas , cut ) :: previous_info
QAOA : QuantumOp t = > { n : Nat } -> ( k : Nat ) -> ( p : Nat ) ->
                                      Graph n -> IO ( Cut n)
QAOA k p graph = do
  res <- QAOA ' {t } k p graph
  let cuts = map (\( _ , _ , cut ) => cut ) res
  let ( cut , size ) = bestCut graph cuts
  pure cut
```
Fig. 9. Qimaera implementation (excerpt) for the QAOA algorithm solving the MAX-CUT problem.

Promotion of Measurements is the ability to integrate the outcomes of quantum measurements as a native classical type (e.g., Bool): this essentially allows us to switch from a quantum mode of operation into a classical one and allows us to use both quantum and classical programming paradigms; it may be roughly understood as corresponding to the promotion rule of linear logic [16]. For Higher-order Functions we distinguish between purely classical ones and mixed classical-quantum (in the second column); some languages support both, but treat the quantum ones non-linearly which may cause loss of type safety. Finally, by Effects we mean the ability to incorporate probabilistic computational effects (which are an essential part of the dynamics of hybrid classical-quantum programs) and also IO (input/output) effects into our programming workflow.

The QWIRE language [28,31] and the SQIR language [20,19] are quantum circuit languages that are embedded in the Coq proof assistant [11]. Both of these languages have access to dependent types, courtesy of Coq. The focus of these languages is mostly on verification, whereas in Qimaera we focus on programming and Idris 2 has better support for classical, quantum and effectful programming features compared to Coq. Both QWIRE and SQIR represent quantum primitives through the use of low-level specification languages that are embedded in Coq: both of these specification languages lack the ability to express quantum algorithms that require general recursion and both of them lack the ability to express quantum higher-order functions. Because of the former reason, the RUS algorithm from §5.2 cannot be expressed in QWIRE or SQIR.

Silq [9] is a standalone quantum programming language which also is typesafe and whose main notable feature is automatic uncomputation of temporary values. We currently partially support this feature, because we have clearly identified and separated the reversible fragment of quantum computation (see the Unitary type) and we can synthesise the required adjoints by calling the adjoint function. Compared to Silq, the main advantage of Qimaera is that Idris has better support for classical programming features and so we believe that Qimaera is a better choice for hybrid classical-quantum programming. In addition, Silq does not support general recursion, so it cannot express quantum algorithms that rely on this (e.g., RUS §5.2).


Table 2. Feature comparison between Qimaera and other languages.

Quipper [18] and the Quantum IO monad (QIO) [3] are two domain specific languages (DSLs) embedded in Haskell. Neither of them are type safe because they do not utilise linearity and they cannot statically detect quantum programs that are physically inadmissible. However, thanks to the language similarities between Haskell and Idris, the programming style in these languages is somewhat similar to ours (e.g., all three use monads). In our view, both of these papers have been influential for the design of functional quantum programming languages.

Another recent language includes Proto-Quipper-D [14] which is a type-safe circuit description language. This language is based on a novel type system which shows how linearity and dependent types can be combined. A fundamental difference between Proto-Quipper-D and Qimaera is that linearity is the default mode of operation in Proto-Quipper-D, whereas in Qimaera the default mode is non-linear. The focus in Proto-Quipper-D is on circuit description and generation and the language currently lacks effectful quantum measurements and probabilistic effects, so it cannot be used for variational quantum programming at present. Another related language is Proto-Quipper-Dyn [15]. It is similar to Proto-Quipper-D, but it lacks dependent types (which Qimaera has). On the other hand, it can handle quantum measurements and has dynamic lifting, i.e., the ability to parameterize quantum circuits based on information observed from quantum measurements. Note that Qimaera also has dynamic lifting.

Other languages, include Google's Cirq [17] (a set of python libraries), IBM's Qiskit [2] (a set of python libraries) and Microsoft's Q] [35] (standalone). These languages offer a wide-range of quantum functions and features, however, none of them are type-safe. Qimaera does not have this problem and this is indeed its main advantage over them, together with dependent types.

# 8 Future Work

For future work, it would be interesting to consider methods that would allow us to reduce some of the proof obligations that are imposed by the Unitary data type. Going beyond Idris and our library, another natural direction is to consider whether programming languages that support substructural approaches other than linearity (e.g., uniqueness types, ownership) can be used to achieve type-safe quantum programming. It would also be interesting to consider the relevance of arrows [21,5] in quantum programming. Furthermore, implementing and testing our abstract interface on an actual hybrid quantum-classical hardware environment would most likely bring additional challenges.

Acknowledgements. We thank Robert Rand for discussions about this paper. We also thank the anonymous referees for their feedback which lead to multiple improvements of this paper. EJ is supported by the PEPR integrated project EPiQ and the European Project NEASQC (Grant Agreement 951821). Most of the work was done in LORIA/Inria Nancy during an Inria internship of the first author who was a student at Ecole polytechnique. The first and last authors ´ have changed affiliations since then.

# References


(2021). https://doi.org/10.4230/LIPIcs.ECOOP.2021.9, https://doi.org/10. 4230/LIPIcs.ECOOP.2021.9


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Automatic Alignment in Higher-Order Probabilistic Programming Languages?

Daniel Lundén<sup>1</sup> () , Gizem Çaylak<sup>1</sup> , Fredrik Ronquist2,<sup>3</sup> , and David Broman<sup>1</sup>

<sup>1</sup> EECS and Digital Futures, KTH Royal Institute of Technology, Stockholm, Sweden, {dlunde,caylak,dbro}@kth.se

<sup>2</sup> Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden, fredrik.ronquist@nrm.se

<sup>3</sup> Department of Zoology, Stockholm University, Stockholm, Sweden

Abstract. Probabilistic Programming Languages (PPLs) allow users to encode statistical inference problems and automatically apply an inference algorithm to solve them. Popular inference algorithms for PPLs, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC), are built around checkpoints—relevant events for the inference algorithm during the execution of a probabilistic program. Deciding the location of checkpoints is, in current PPLs, not done optimally. To solve this problem, we present a static analysis technique that automatically determines checkpoints in programs, relieving PPL users of this task. The analysis identifies a set of checkpoints that execute in the same order in every program run—they are aligned. We formalize alignment, prove the correctness of the analysis, and implement the analysis as part of the higher-order functional PPL Miking CorePPL. By utilizing the alignment analysis, we design two novel inference algorithm variants: aligned SMC and aligned lightweight MCMC. We show, through real-world experiments, that they significantly improve inference execution time and accuracy compared to standard PPL versions of SMC and MCMC.

Keywords: Probabilistic programming · Operational semantics · Static analysis.

# 1 Introduction

Probabilistic programming languages (PPLs) are languages used to encode statistical inference problems, common in research fields such as phylogenetics [39],

<sup>?</sup> This project is financially supported by the Swedish Foundation for Strategic Research (FFL15-0032 and RIT15-0012), and also partially supported by the Wallenberg Al, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, and the Swedish Research Council (grants 2018- 04620 and 2021-04830). The research has also been carried out as part of the Vinnova Competence Center for Trustworthy Edge Computing Systems and Applications at KTH Royal Institute of Technology.

computer vision [16], topic modeling [5], data cleaning [23], and cognitive science [15]. PPL implementations automatically solve encoded problems by applying an inference algorithm. In particular, automatic inference allows users to solve inference problems without having in-depth knowledge of inference algorithms and how to apply them. Some examples of PPLs are WebPPL [14], Birch [31], Anglican [48], Miking CorePPL [25], Turing [12], and Pyro [3].

Sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) are general-purpose families of inference algorithms often used for PPL implementations. These algorithms share the concept of checkpoints: relevant execution events for the inference algorithm. For SMC, the checkpoints are likelihood updates [48,14] and determine the resampling of executions. Alternatively, users must sometimes manually annotate or write the probabilistic program in a certain way to make resampling explicit [25,31]. For MCMC, checkpoints are instead random draws, which allow the inference algorithm to manipulate these draws to construct a Markov chain over program executions [47,38]. When designing SMC and MCMC algorithms for universal PPLs<sup>4</sup> , both the placement and handling of checkpoints are critical to making the inference both efficient and accurate.

For SMC, a standard inference approach is to resample at all likelihood updates [14,48]. This approach produces correct results asymptotically [24] but is highly problematic for certain models [39]. Such models require non-trivial and SMC-specific manual program rewrites to force good resampling locations and make SMC tractable. Overall, choosing the likelihood updates at which to resample significantly affects SMC execution time and accuracy.

For MCMC, a standard approach for inference in universal PPLs is lightweight MCMC [47], which constructs a Markov chain over random draws in programs. The key idea is to use an addressing transformation and a runtime database of random draws. Specifically, the database enables matching and reusing random draws between executions according to their stack traces, even if the random draws may or may not occur due to randomness during execution. However, the dynamic approach of looking up random draws in the database through their stack traces is expensive and introduces significant runtime overhead.

To overcome the SMC and MCMC problems in universal PPLs, we present a static analysis technique for higher-order functional PPLs that automatically determines checkpoints in a probabilistic program that always occur in the same order in every program execution—they are aligned. We formally define alignment, formalize the alignment analysis, and prove the soundness of the analysis with respect to the alignment definition. The novelty and challenge in developing the static analysis technique is to capture alignment properties through the identification of expressions in programs that may evaluate to stochastic values and expressions that may evaluate due to stochastic branching. Stochastic branching results from if expressions with stochastic values as conditions or function applications where the function itself is stochastic. Stochastic values and branches pose a significant challenge when proving the soundness of the analysis.

<sup>4</sup> A term coined by Goodman et al. [13]. Essentially, it means that the types and numbers of random variables cannot be determined statically.

We design two new inference algorithms that improve accuracy and execution time compared to current approaches. Unlike the standard SMC algorithm for PPLs [48,14], aligned SMC only resamples at aligned likelihood updates. Resampling only at aligned likelihood updates guarantees that each SMC execution resamples the same number of times, which makes expensive global termination checks redundant [25]. We evaluate aligned SMC on two diversification models from Ronquist et al. [39] and a state-space model for aircraft localization, demonstrating significantly improved inference accuracy and execution time compared to traditional SMC. Both models—constant rate birth-death (CRBD) and cladogenetic diversification rate shift (ClaDS)—are used in real-world settings and are of considerable interest to evolutionary biologists [33,28]. The documentations of both Anglican [48] and Turing [12] acknowledge the importance of alignment for SMC and state that all likelihood updates must be aligned. However, Turing and Anglican neither formalize nor enforce this property—it is up to the users to manually guarantee it, often requiring non-standard program rewrites [39].

We also design aligned lightweight MCMC, a new version of lightweight MCMC [47]. Aligned lightweight MCMC constructs a Markov chain over the program using the aligned random draws as synchronization points to match and reuse aligned random draws and a subset of unaligned draws between executions. Aligned lightweight MCMC does not require a runtime database of random draws and therefore reduces runtime overhead. We evaluate aligned lightweight MCMC for latent Dirichlet allocation (LDA) [5] and CRBD [39], demonstrating significantly reduced execution times and no decrease in inference accuracy. Furthermore, automatic alignment is orthogonal to and easily combines with the lightweight MCMC optimizations introduced by Ritchie et al. [38].

We implement the analysis, aligned SMC, and aligned lightweight MCMC in Miking CorePPL [25,7]. In addition to analyzing stochastic if-branching, the implementation analyzes stochastic branching at a standard pattern-matching construct. Compared to if expressions, the pattern-matching construct requires a more sophisticated analysis of the pattern and the value matched against it to determine if the pattern-matching causes a stochastic branch.

In summary, we make the following contributions.


Section 7 describes the evaluation and discusses its results. The paper also has an accompanying artifact that supports the evaluation [26]. Section 8 discusses related work and Section 9 concludes. Next, Section 2 considers a simple motivating example to illustrate the key ideas. Section 3 introduces syntax and semantics for the calculus used to formalize the alignment analysis.

An extended version of the paper is also available at arXiv [27]. We use the symbol † in the text to indicate that more information (e.g., proofs) is available in the extended version.

# 2 A Motivating Example

This section presents a motivating example that illustrates the key alignment ideas in relation to aligned SMC (Section 2.1) and aligned lightweight MCMC (Section 2.2). We assume basic knowledge of probability theory. Knowledge of PPLs is helpful, but not a strict requirement. The book by van de Meent et al. [46] provides a good introduction to PPLs.

Probabilistic programs encode Bayesian statistical inference problems with two fundamental constructs: assume and weight. The assume construct defines random variables, which make execution nondeterministic. Intuitively, a probabilistic program then encodes a probability distribution over program executions (the prior distribution), and it is possible to sample from this distribution by executing the program with random sampling at assumes. The weight construct updates the likelihood of individual executions. Updating likelihoods for executions modifies the probability distribution induced by assumes, and the inference problem encoded by the program is to determine or approximate this modified distribution (the posterior distribution). The main purpose of weight in realworld models is to condition executions on observed data.<sup>5</sup>

Consider the probabilistic program in Fig. 1a. The program is contrived and purposefully constructed to compactly illustrate alignment, but the realworld diversification models in Ronquist et al. [39] that we also consider in Section 7 inspired the program's general structure. The program defines (line 1) and returns (line 18) a Gamma-distributed random variable rate. Figure 1b illustrates the Gamma distribution. To modify the likelihood for values of rate, the program executes the iter function (line 10) three times, and the survives function (line 2) a random number of times n (line 13) within each iter call.

Conceptually, to infer the posterior distribution of the program, we execute the program infinitely many times. In each execution, we draw samples for the random variables defined at assume, and accumulate the likelihood at weight. The return value of the execution, weighted by the accumulated likelihood, represents one sample from the posterior distribution. Fig. 1c shows a histogram of such weighted samples of rate resulting from a large number of executions of Fig. 1a. The fundamental inference algorithm that produces such weighted samples is called likelihood weighting (a type of importance sampling [32]). We

<sup>5</sup> A number of more specialized constructs for likelihood updating are also available in various PPLs, for example observe [48,14] and condition [14].

Fig. 1: A simple example illustrating alignment. Fig. (a) gives a probabilistic program using functional-style PPL pseudocode. Fig. (b) illustrates the Gamma(2, 2) probability density function. Fig. (c) illustrates a histogram over weighted rate samples produced by running the program in (a) a large number of times. Fig. (d) shows two line number sequences w<sup>1</sup> and w<sup>2</sup> of weights encountered in two program runs (top) and how to align them (bottom). Fig. (e) shows two line number sequences s<sup>1</sup> and s<sup>2</sup> of assumes encountered in two program runs (top) and how to align them (bottom).

see that, compared to the prior distribution for rate in Fig. 1b, the posterior is more sharply peaked due to the likelihood modifications.

#### 2.1 Aligned SMC

Likelihood weighting can only handle the simplest of programs. In Fig. 1a, a problem with likelihood weighting is that we assign the weight 0 to many executions at line 8. These executions contribute nothing to the final distribution. SMC solves this by executing many program instances concurrently and occasionally resampling them (with replacement) based on their current likelihoods. Resampling discards executions with lower weights (in the worst case, 0) and replaces them with executions with higher weights. The most common approach in popular PPLs is to resample just after likelihood updates (i.e., calls to weight).

Resampling at all calls to weight in Fig. 1a is suboptimal. The best option is instead to only resample at line 12. This is because executions encounter lines 5 and 8 a random number of times due to the stochastic branch at line 3, while they encounter line 12 a fixed number of times. As a result of resampling at lines 5 and 8, executions become unaligned; in each resampling, executions can have reached either line 5, line 8, or line 12. On the other hand, if we resample only at line 12, all executions will always have reached line 12 for the same iteration of iter in every resampling. Intuitively, this is a sensible approach since, when resampling,

executions have progressed the same distance through the program. We say that the weight at line 12 is aligned, and resampling only at aligned weights results in our new inference approach called aligned SMC. Fig. 1d visualizes the weight alignment for two sample executions of Fig. 1a.

#### 2.2 Aligned Lightweight MCMC

Another improvement over likelihood weighting is to construct a Markov chain over program executions. It is beneficial to propose new executions in the Markov chain by making small, rather than large, modifications to the previous execution. The lightweight MCMC [47] algorithm does this by redrawing a single random draw in the previous execution, and then reusing as many other random draws as possible. Random draws in the current and previous executions match through stack traces—the sequences of applications leading up to a random draw. Consider the random draw at line 13 in Fig. 1a. It is called exactly three times in every execution. If we identify applications and assumes by line numbers, we get the stack traces [17, 13], [17, 15, 13], and [17, 15, 15, 13] for these three assumes in every execution. Consequently, lightweight MCMC can reuse these draws by storing them in a database indexed by stack traces.

The stack trace indexing in lightweight MCMC is overly complicated when reusing aligned random draws. Note that the assumes at lines 1 and 13 in Fig 1a are aligned, while the assume at line 4 is unaligned. Fig. 1e visualizes the assume alignment for two sample executions of Fig. 1a. Aligned random draws occur in the same same order in every execution, and are therefore trivial to match and reuse between executions through indexing by counting. The appeal with stack trace indexing is to additionally allow reusing a subset of unaligned draws.

A key insight in this paper is that aligned random draws can also act as synchronization points in the program to allow reusing unaligned draws without a stack trace database. After an aligned draw, we reuse unaligned draws occurring up until the next aligned draw, as long as they syntactically originate at the same assume as the corresponding unaligned draws in the previous execution. As soon as an unaligned draw does not originate from the same assume as in the previous execution, we redraw all remaining unaligned draws up until the next aligned draw. Instead of a trace-indexed database, this approach requires storing a list of unaligned draws (tagged with identifiers of the assumes at which they originated) for each execution segment in between aligned random draws. For example, for the execution s<sup>1</sup> in Fig. 1e, we store lists of unaligned Bernoulli random draws from line 4 for each execution segment in between the three aligned random draws at line 13. If a Poisson random draw n at line 13 does not change or decreases, we can reuse the stored unaligned Bernoulli draws up until the next Poisson random draw as survives executes n or fewer times. If the drawn n instead increases to n 0 , we can again reuse all stored Bernoulli draws, but must supplement them with new Bernoulli draws to reach n <sup>0</sup> draws in total.

As we show in Section 7, using aligned draws as synchronization points works very well in practice and avoids the runtime overhead of the lightweight MCMC database. However, manually identifying aligned parts of programs and rewriting them so that inference can make use of alignment is, if even possible, tedious, error-prone, and impractical for large programs. This paper presents an automated approach to identifying aligned parts of programs. Combining static alignment analysis and using aligned random draws as synchronization points form the key ideas of the new algorithm that we call aligned lightweight MCMC.

# 3 Syntax and Semantics

In preparation for the alignment analysis in Section 4, we require an idealized base calculus capturing the key features of expressive PPLs. This section introduces such a calculus with a formal syntax (Section 3.1) and semantics (Section 3.2). We assume a basic understanding of the lambda calculus (see, e.g., Pierce [37] for a complete introduction). Section 6 further describes extending the idealized calculus and the analysis in Section 4 to a full-featured PPL.

#### 3.1 Syntax

We use the untyped lambda calculus as the base for our calculus. We also add let expressions for convenience, and if expressions to allow intrinsic booleans to affect control flow. The calculus is a subset of the language used in Fig. 1a. We inductively define terms t and values v as follows.

#### Definition 1 (Terms and values).

$$\begin{array}{llll}\mathbf{t} ::= x & c \mid \ \boldsymbol{\lambda}x. \ \mathbf{t} \mid \mathbf{t} \ \mathbf{t} \ \mid \ \mathbf{1} \mathbf{t} \ \boldsymbol{x} = \mathbf{t} \ \mathbf{n} \ \mathbf{t} & \mathbf{v} ::= c \mid \langle \boldsymbol{\lambda}x. \ \mathbf{t}, \boldsymbol{\rho} \rangle \\\quad \mid \quad \mathbf{if} \ \mathbf{t} \ \mathbf{t} \ \mathbf{then} \ \mathbf{t} \ \mathbf{else} \ \mathbf{t} \mid \mathbf{assume} \ \mathbf{t} \ \mid \ \mathbf{weight} \ \mathbf{t} \\\quad x, y \in X & \boldsymbol{\rho} \in P \quad c \in C \quad \{\text{false}, \text{true}, \text{()}\} \cup \mathbb{R} \cup D \subseteq C. \end{array} \tag{1}$$

X is a countable set of variable names, C a set of intrinsic values and operations, and D ⊂ C a set of probability distributions. The set P contains all evaluation environments ρ, that is, partial functions mapping names in X to values v. We use T and V to denote the set of all terms and values, respectively.

Values v are intrinsics or closures, where closures are abstractions with an environment binding free variables in the abstraction body. We require that C include booleans, the unit value (), and real numbers. The reason is that weight takes real numbers as argument and returns () and that if expression conditions are booleans. Furthermore, probability distributions are often over booleans and real numbers. For example, we can include the normal distribution constructor N ∈ C that takes real numbers as arguments and produces normal distributions over real numbers. For example, N 0 1 ∈ D, the standard normal distribution. We often write functions in C in infix position or with standard function application syntax for readability. For example, 1 + 2 with + ∈ C means + 1 2, and N (0, 1) means N 0 1. Additionally, we use the shorthand t1; t<sup>2</sup> for let \_ = t<sup>1</sup> in t2, where \_ is the do-not-care symbol. That is, t1; t<sup>2</sup> evaluates t<sup>1</sup> for side

Fig. 2: A probabilistic program tgeo [25], illustrating (1). Fig. (a) gives the program, and (b) the corresponding probability distributions. In (b), the y-axis gives the probability, and the x-axis gives the outcome (the number of coin flips). The upper part of (b) excludes the shaded weight at line 4 in (a).

effects only before evaluating t2. Finally, the untyped lambda calculus supports recursion through fixed-point combinators. We encapsulate this in the shorthand let rec f = λx.t<sup>1</sup> in t<sup>2</sup> to conveniently define recursive functions.

The assume and weight constructs are PPL-specific. We define random variables from intrinsic probability distributions with assume (also known as sample in PPLs with sampling-based inference). For example, the term let x = assume N (0, 1) in t defines x as a random variable with a standard normal distribution in t. Boolean random variables combined with if expressions result in stochastic branching—causing the alignment problem. Lastly, weight (also known as factor or score) is a standard construct for likelihood updating (see, e.g., Borgström et al. [6]). Next, we illustrate and formalize a semantics for (1).

#### 3.2 Semantics

Consider the small probabilistic program tgeo ∈ T in Fig. 2a. The program encodes the standard geometric distribution via a function geometric, which recursively flips a fair coin (a Bernoulli(0.5) distribution) at line 2 until the outcome is false (i.e., tails). At that point, the program returns the total number of coin flips, including the last tails flip. The upper part of Fig. 2b illustrates the result distribution for an infinite number of program runs with line 4 ignored.

To illustrate the effect of weight, consider tgeo with line 4 included. This weight modifies the likelihood with a factor 1.5 each time the flip outcome is true (or, heads). Intuitively, this emphasizes larger return values, illustrated in the lower part of Fig. 2b. Specifically, the (unnormalized) probability of seeing n coin flips is 0.5 <sup>n</sup> · 1.5 n−1 , compared to 0.5 <sup>n</sup> for the unweighted version. The factor 1.5 n−1 is the result of the calls to weight.

We now introduce a big-step operational semantics for single runs of programs t. Such a semantics is essential to formalize the probability distributions encoded by probabilistic programs (e.g., Fig. 2b for Fig. 2a) and to prove the correctness of PPL inference algorithms. For example, Borgström et al. [6] define a PPL calculus and semantics similar to this paper and formally proves the correctness of an MCMC algorithm. Another example is Lundén et al. [24], who also define a

$$\begin{array}{lll}\hline\hline\rho\vdash\texttt{t}^{1}\Downarrow\texttt{t}^{1}\_{\mathsf{U}}\left(\rho\begin{array}{c}\text{(VaR)}\\\rho\vdash\texttt{t}^{1}\Downarrow\texttt{t}^{2}\texttt{t}\end{array}\{\texttt{)}\Phi\begin{array}{c}\rho\vdash\texttt{t}\lambda\texttt{t}^{1}\Downarrow\texttt{t}^{2}\texttt{t}\end{array}\{\texttt{(C\texttt{A}\texttt{A})}\\\\rho\vdash\texttt{t}\texttt{t}^{1}\Downarrow\texttt{t}^{2}\texttt{t}\end{array}\{\texttt{(A\texttt{A}\texttt{B})}\Downarrow\texttt{t}^{2}\texttt{t}\}\_{\rho\mathbin{\texttt{t}}\mkern{10.0pt{}}\texttt{(A\texttt{A}\texttt{A})}\\\hline\rho\vdash\texttt{t}\texttt{t}\texttt{t}^{1}\Downarrow\texttt{t}^{2}\texttt{t}\texttt{(A\texttt{B})}\Downarrow\texttt{t}^{2}\texttt{t}\texttt{(B\texttt{A}\texttt{A})}\texttt{(C\texttt{A}\texttt{B})}\\\hline\rho\vdash\texttt{t}\texttt{t}\texttt{t}^{1}\Downarrow\texttt{t}^{2}\texttt{t}\texttt{(A\texttt{B})}\Downarrow\texttt{t}^{2}\texttt{(B\texttt{C}\texttt{A})}\texttt{(C\texttt{A}\texttt{B})}\\\hline\rho\vdash\texttt{t}\texttt{t}\texttt{t}\texttt{t}^{1}\texttt{\(\texttt{A}\texttt{B}\texttt{A}\texttt{\(\texttt{A}\texttt{B}\texttt{C}\)}\Downarrow\texttt{t}^{2}\texttt{\(\texttt{A}\texttt{B}\texttt{C}\)}\texttt{(C\texttt{A}\texttt{B}\texttt{C}\texttt{\(\(\beta\)}\texttt{(A\(\beta\)}\Rightarrow\texttt{\(\beta\)}\texttt{(A\(\beta\)}\texttt{(C\(\beta\))}\\\hline\end{array}}\\\begin{array}{cl}\rho\vdash\texttt{t}\texttt{t}\texttt{t}\texttt{t}\texttt{t}\texttt{t}\texttt{t}\texttt{t}\texttt{$$

Fig. 3: A big-step operational semantics for terms, formalizing single runs of programs t ∈ T. The operation ρ, x 7→ v produces a new environment extending ρ with a binding v for x. For each distribution d ∈ D, f<sup>d</sup> is its probability density or probability mass function—encoding the relative probability of drawing particular values from the distribution. For example, fBernoulli(0.3)(true) = 0.3 and fBernoulli(0.3)(false) = 1 − 0.3 = 0.7. We use · to denote multiplication.

similar calculus and semantics and prove the correctness of PPL SMC algorithms. In particular, the correctness of our aligned SMC algorithm (Section 5.1) follows from this proof. The purpose of the semantics in this paper is to formalize alignment and prove the soundness of our analysis in Section 4. We use a bigstep semantics as the finer granularity in a small-step semantics is redundant. We begin with a definition for intrinsics.

Definition 2 (Intrinsic functions). For every c ∈ C, we attach an arity |c| ∈ N. We define a partial function δ : C × C → C such that δ(c, c1) = c<sup>2</sup> is defined for |c| > 0. For all c, c1, and c2, such that δ(c, c1) = c2, |c2| = |c| − 1.

Intrinsic functions are curried and produce intrinsic or intrinsic functions of one arity less through δ. For example, for + ∈ C, we have δ(δ(+, 1), 2) = 3, |+| = 2, |δ(+, 1)| = 1, and |δ(δ(+, 1), 2)| = 0. Next, randomness in our semantics is deterministic via a trace of random draws in the style of Kozen [22].

Definition 3 (Traces). The set S of traces is the set such that, for all s ∈ S, s is a sequence of intrinsics from C with arity 0.

In the following, we use the notation [c1, c2, . . . , cn] for sequences and k for sequence concatenation. For example, [c1, c2] k [c2, c4] = [c1, c2, c3, c4]. We also use subscripts to select elements in a sequence, e.g., [c1, c2, c3, c4]<sup>2</sup> = c2. In practice, traces are often sequences of real numbers, e.g., [1.1, 3.2, 8.4] ∈ S.

Fig. 3 presents the semantics as a relation ρ ` t s⇓ w <sup>l</sup> v over P × T × S × R × L × V . L is the set of sequences over X, i.e., sequences of names. For example, [x, y, z] ∈ L, where x, y, z ∈ X. We use l ∈ L to track the sequence of letbindings during evaluation. For example, evaluating let x = 1 in let y = 2 in x + y results in l = [x, y]. In Section 4, we use the sequence of encountered let-bindings to define alignment. For simplicity, from now on we assume that bound variables are always unique (i.e., variable shadowing is impossible).

It is helpful to think of ρ, t, and s as the input to ⇓, and l, w and v as the output. In the environment ρ, t, with trace s, evaluates to v, encounters the sequence of let bindings l, and accumulates the weight w. The trace s is the sequence of all random draws, and each random draw in (Assume) consumes precisely one element of s. The rule (Let) tracks the sequence of bindings by adding x at the correct position in l. The number w is the likelihood of the execution—the probability density of all draws in the program, registered at (Assume), combined with direct likelihood modifications, registered at (Weight). The remaining aspects of the semantics are standard (see, e.g., Kahn [20]). To give an example of the semantics, we have ∅ ` tgeo [true,true,true,false] ⇓ 0.5·1.5·0.5·1.5·0.5·1.5·0.5 [geometric,x,x,x,x] 4 for the particular execution of tgeo making three recursive calls. Next, we formalize and apply the alignment analysis to (1).

# 4 Alignment Analysis

This section presents the main contribution of this paper: automatic alignment in PPLs. Section 4.1 introduces A-normal form and gives a precise definition of alignment. Section 4.2 formalizes and proves the correctness of the alignment analysis. Lastly, Section 4.3 discusses a dynamic version of alignment.

#### 4.1 A-Normal Form and Alignment

To reason about all subterms t <sup>0</sup> of a program t and to enable the analysis in Section 4.2, we need to uniquely label all subterms. A straightforward approach is to use variable names within the program itself as labels (remember that we assume bound variables are always unique). This leads us to the standard A-normal form (ANF) representation of programs [11].

#### Definition 4 (A-normal form).

$$\begin{aligned} \textbf{t}\_{\text{ANF}} & \hspace{1cm} x \mid \textbf{1et} \ x = \textbf{t}'\_{\text{ANF}} \quad \textbf{in} \ \textbf{t}\_{\text{ANF}}\\ \mathtt{t}'\_{\text{ANF}} & \hspace{1cm} x \mid \lambda x. \ \textbf{t}\_{\text{ANF}} \mid x \ y \\ & \hspace{1cm} x \ \textbf{then} \ \textbf{t}\_{\text{ANF}} \ \textbf{else} \ \textbf{t}\_{\text{ANF}} \mid \textbf{assume} \ x \mid \textbf{weight} \ x \end{aligned} \tag{2}$$

We use TANF to denote the set of all terms tANF. Unlike t ∈ T, tANF ∈ TANF enforces that a variable bound by a let labels each subterm in the program. Furthermore, we can automatically transform any program in T to a semantically equivalent TANF program, and TANF ⊂ T. Therefore, we assume in the remainder of the paper that all terms are in ANF.

Given the importance of alignment in universal PPLs, it is somewhat surprising that there are no previous attempts to give a formal definition of its meaning. Here, we give a first such formal definition, but before defining alignment, we require a way to restrict, or filter, sequences.

Definition 5 (Restriction of sequences). For all l ∈ L and Y ⊆ X, l|<sup>Y</sup> (the restriction of l to Y ) is the subsequence of l with all elements not in Y removed.

For example, [x, y, z, y, x]|{x,z} = [x, z, x]. We now formally define alignment.

Definition 6 (Alignment). For t ∈ TANF, let X<sup>t</sup> denote all variables that occur in t. The sets A<sup>t</sup> ∈ At, A<sup>t</sup> ⊆ Xt, are the largest sets such that, for arbitrary ∅ ` t <sup>s</sup><sup>1</sup> ⇓ w<sup>1</sup> l1 v<sup>1</sup> and ∅ ` t <sup>s</sup><sup>2</sup> ⇓ w<sup>2</sup> l2 v2, l1|A<sup>t</sup> = l2|A<sup>t</sup> .

For a given A<sup>t</sup> , the aligned expressions—expressions bound by a let to a variable name in At—are those that occur in the same order in every execution, regardless of random draws. We seek the largest sets, as A<sup>t</sup> = ∅ is always a trivial solution. Assume we have a program with X<sup>t</sup> = {x, y, z} and such that l = [x, y, x, z, x] and l = [x, y, x, z, x, y] are the only possible sequences of let bindings. Then, A<sup>t</sup> = {x, z} is the only possibility. It is also possible to have multiple choices for At . For example, if l = [x, y, z] and l = [x, z, y] are the only possibilities, then A<sup>t</sup> = {{x, z}, {x, y}}. Next, assume that we transform the programs in Fig. 2a and Fig. 1a to ANF. The expression labeled by x in Fig. 2a is then clearly not aligned, as random draws determine how many times it executes (l could be, e.g., [x, x] or [x, x, x, x]). Conversely, the expression n (line 13) in Fig. 1a is aligned, as its number and order of evaluations do not depend on any random draws.

Definition 6 is context insensitive: for a given A<sup>t</sup> , each x is either aligned or unaligned. One could also consider a context-sensitive definition of alignment in which x can be aligned in some contexts and unaligned in others. A context could, for example, be the sequence of function applications (i.e., the call stack) leading up to an expression. Considering different contexts for x is complicated and difficult to take full advantage of. We justify the choice of context-insensitive alignment with the real-world models in Section 7, neither of which requires a context-sensitive alignment.

With alignment defined, we now move on to the static alignment analysis.

#### 4.2 Alignment Analysis

The basis for the alignment analysis is 0-CFA [34,42]—a static analysis framework for higher-order functional programs. The prefix 0 indicates that 0-CFA is context insensitive. There is also a set of analyses k-CFA [30] that adds increasing amounts (with k ∈ N) of context sensitivity to 0-CFA. We could use such analyses with a context-sensitive version of Definition 6. However, the potential benefit of k-CFA is also offset by the worst-case exponential time complexity, already at k = 1. In contrast, the time complexity of 0-CFA is polynomial (cubic in the worst-case). The alignment analysis for the models in Section 7 runs instantaneously, justifying that the time complexity is not a problem in practice.

```
1 let n1 = ¬ in let n2 = ¬ in
2 let one = 1 in
3 let half = 0.5 in let c = true in
4 let f1 = λx1. let t1 = weight one in x1 in
5 let f2 = λx2. let t2 = weight one in t2 in
6 let f3 = λx3. let t3 = weight one in t3 in
7 let f4 = λx4. let t4 = weight one in t4 in
8 let bern = Bernoulli in
9 let d1 = bern half in
10 let a1 = assume d1
11 let v1 = f1 one in
                                                 12 let v2 = n1 a1 in
                                                 13 let v3 = n2 c in
                                                 14 let f5 =
                                                 15 if a1 then let t5 = f4 one in f2
                                                 16 else f3
                                                 17 in
                                                 18 let v4 = f5 one in
                                                 19 let i1 =
                                                 20 if c then let t6 = f1 one in t6
                                                 21 else one
                                                 22 in i1
```
Fig. 4: A program texample ∈ TANF illustrating the analysis.

The extensions to 0-CFA required to analyze alignment are non-trivial to design, but the resulting formalization is surprisingly simple. The challenge is instead to prove that the extensions correctly capture the alignment property from Definition 6. We extend 0-CFA to analyze stochastic values and alignment in programs t ∈ TANF. As with most static analyses, our analysis is sound but conservative (i.e., sound but incomplete)—the analysis may mark aligned expressions of programs as unaligned, but not vice versa. That the analysis is conservative does not degrade the alignment analysis results for any model in Section 7, which justifies the approach. We divide the formal analysis into two algorithms. Algorithm 1 generates constraints for t that a valid analysis solution must satisfy. This section describes Algorithm 1 and the generated constraints. The second algorithm computes a solution that satisfies the generated constraints. We describe the algorithm at a high level, but omit a full formalization.†

For soundness of the analysis, we require hλx. t, ρi 6∈ C (recall that C is the set of intrinsics). That is, closures are not in C. By Definition 3, this implies that closures are not in the sample space of probability distributions in D and that evaluating intrinsics never produces closures (this would unnecessarily complicate the analysis without any benefit).

In addition to standard 0-CFA constraints, Algorithm 1 generates new constraints for stochastic values and unalignment. We use the contrived but illustrative program in Fig. 4 as an example. Note that, while omitted from Fig. 4 for ease of presentation, the analysis also supports recursion introduced through let rec. Stochastic values are values in the program affected by random variables. Stochastic values initially originate at assume and then propagate through programs via function applications and if expressions. For example, a<sup>1</sup> (line 10) is stochastic because of assume. We subsequently use a<sup>1</sup> to define v<sup>2</sup> via n<sup>1</sup> (line 12), which is then also stochastic. Similarly, a<sup>1</sup> is the condition for the if resulting in f<sup>5</sup> (line 14), and the function f<sup>5</sup> is therefore also stochastic. When we apply f5, it results in yet another stochastic value, v<sup>4</sup> (line 18). In conclusion, the stochastic values are a1, v2, f5, and v4.

Consider the flow of unalignment in Fig. 4. We mark expressions that may execute due to stochastic branching as unaligned. From our analysis of stochastic values, the program's only stochastic if condition is at line 15, and we determine that all expressions directly within the branches are unaligned. That is, the expression labeled by t<sup>5</sup> is unaligned. Furthermore, we apply the variable f<sup>4</sup> when defining t5. Thus, all expressions in bodies of lambdas that flow to f<sup>4</sup> are unaligned. Here, it implies that t<sup>4</sup> is unaligned. Finally, we established that the function f<sup>5</sup> produced at line 15 is stochastic. Due to the application at line 18, all names bound by lets in bodies of lambdas that flow to f<sup>5</sup> are unaligned. Here, it implies that t<sup>2</sup> and t<sup>3</sup> are unaligned. In conclusion, the unaligned expressions are named by t2, t3, t4, and t5. For example, aligned SMC therefore resamples at the weight at t1, but not at the weights at t2, t3, and t4.

Consider the program in Fig. 1a again, and assume it is transformed to ANF. The alignment analysis must mark all names bound within the stochastic if at line 3 as unaligned because a stochastic value flows to its condition. In particular, the weight expressions at lines 5 and 8 are unaligned (and the weight at line 12 is aligned). Thus, aligned SMC resamples only at line 12.

To formalize the flow of stochastic values, we define abstract values a ::= λx.y | stoch | const n, where x, y ∈ X and n ∈ N. We use A to denote the set of all abstract values. The stoch abstract value is new and represents stochastic values. The λx.y and const n abstract values are standard and represent abstract closures and intrinsics, respectively. For each variable name x in the program, we define a set S<sup>x</sup> containing abstract values that may occur at x. For example, in Fig. 4, we have stoch ∈ Sa<sup>1</sup> , (λx2.t2) ∈ Sf<sup>2</sup> , and (const 1) ∈ Sn<sup>1</sup> . The abstract value λx2.t<sup>2</sup> represents all closures originating at λx2, and const 1 represents intrinsic functions in C of arity 1 (in our example, ¬). The body of the abstract lambda is the variable name labeling the body, not the body itself. For example, t<sup>2</sup> labels the body let t<sup>2</sup> = one in t<sup>2</sup> of λx2. Due to ANF, all terms have a label, which the function name in Algorithm 1 formalizes.

We also define booleans unaligned<sup>x</sup> that state whether or not the expression labeled by x is unaligned. For example, we previously reasoned that unaligned<sup>x</sup> = true for x ∈ {t2, t3, t4, t5} in Fig. 4. The alignment analysis aims to determine minimal sets S<sup>x</sup> and boolean assignments of unaligned<sup>x</sup> for every program variable x ∈ X. A trivial solution is that all abstract values (there is a finite number of them in the program) flow to each program variable and that unaligned<sup>x</sup> = true for all x ∈ X. This solution is sound but useless. To compute a more precise solution, we follow the rules given by constraints c ∈ R. †

We present the constraints through the generateConstraints function in Algorithm 1 and for the example in Fig. 4. There are no constraints for variables that occur at the end of ANF let sequences (line 2 in Algorithm 1), and the case for let expressions (lines 3–36) instead produces all constraints. The cases for aliases (line 6), intrinsics (line 7), assume (line 35), and weight (line 36) are the most simple. Aliases of the form let x = y in t<sup>2</sup> establish S<sup>y</sup> ⊆ Sx. That is, all abstract values at y are also in x. Intrinsic operations results in a const abstract value. For example, the definition of n<sup>1</sup> at line 1 in Fig. 4 results in the constraint const 1 ∈ S<sup>n</sup><sup>1</sup> . Applications of assume are the source of stochastic values. For example, the definition of a<sup>1</sup> at line 10 results in the constraint stoch ∈ S<sup>a</sup><sup>1</sup> . Note that assume cannot produce any other abstract values, as we only

Algorithm 1 Constraint generation function for t ∈ TANF. We denote the power set of a set E with P(E).

function generateConstraints(t): TANF → P(R) = match t with <sup>2</sup> | x → ∅ | let x = t<sup>1</sup> in t<sup>2</sup> → generateConstraints(t2) ∪ match t<sup>1</sup> with | y → {S<sup>y</sup> ⊆ Sx} | c → if |c| > 0 then {const |c| ∈ Sx} <sup>8</sup> else ∅ | λy. t<sup>y</sup> → generateConstraints(ty) ∪ {λy. name(ty) ∈ Sx} ∪ {unaligned<sup>y</sup> ⇒ unaligned<sup>n</sup> | n ∈ names(ty)} | lhs rhs → { ∀z∀y λz.y ∈ Slhs ⇒ (Srhs ⊆ Sz) ∧ (S<sup>y</sup> ⊆ Sx), ∀n (const n ∈ Slhs ) ∧ (n > 1) ⇒ const n − 1 ∈ Sx, stoch ∈ Slhs ⇒ stoch ∈ Sx, const \_ ∈ Slhs ⇒ (stoch ∈ Srhs ⇒ stoch ∈ Sx), unaligned<sup>x</sup> ⇒ (∀y λy.\_ ∈ Slhs ⇒ unaligned<sup>y</sup> ), stoch ∈ Slhs ⇒ (∀y λy.\_ ∈ Slhs ⇒ unaligned<sup>y</sup> ) 25 } | if y then t<sup>t</sup> else t<sup>e</sup> → generateConstraints(tt) ∪ generateConstraints(te) ∪ {Sname(tt) ⊆ Sx, Sname(te) ⊆ Sx, stoch ∈ S<sup>y</sup> ⇒ stoch ∈ Sx} ∪ {unaligned<sup>x</sup> ⇒ unaligned<sup>n</sup> | n ∈ names(tt) ∪ names(te)} ∪ {stoch ∈ S<sup>y</sup> ⇒ unaligned<sup>n</sup> | n ∈ names(tt) ∪ names(te)} | assume \_ → {stoch ∈ Sx} | weight \_ → ∅ 37 function name(t): TANF → X = match t with 40 | x → x | let x = t<sup>1</sup> in t<sup>2</sup> → name(t2) 42 function names(t): TANF → P(X) = match t with <sup>45</sup> | x → ∅ | let x = \_ in t<sup>2</sup> → {x} ∪ names(t2) 47 48 49 50

allow distributions over intrinsics with arity 0 (see Definition 3). Finally, we use weight only for its side effect (likelihood updating), and therefore weights do not produce any abstract values and consequently no constraints.

The cases for abstractions (line 9), applications (line 13), and ifs (line 26) are more complex. The abstraction at line 4 in Fig. 4 generates (omitting the recursively generated constraints for the abstraction body ty) the constraints {λx1.x<sup>1</sup> ∈ Sf<sup>1</sup> } ∪ {unalignedx<sup>1</sup> ⇒ unalignedt<sup>1</sup> }. The first constraint is standard: the abstract lambda λx1.x<sup>1</sup> flows to Sf<sup>1</sup> . The second constraint states that if the abstraction is unaligned, all expressions in its body (here, only t1) are unaligned. We define the sets of expressions within abstraction bodies and if branches through the names function in Algorithm 1 (line 43).

The application f<sup>5</sup> one at line 18 in Fig. 4 generates the constraints

$$\begin{aligned} & \{ \forall z \forall y \ \lambda z. y \in S\_{f\_5} \Rightarrow (S\_{one} \subseteq S\_z) \land (S\_y \subseteq S\_{v\_4}), \\ & \forall n \ (\texttt{const} \ n \in S\_{f\_5}) \land (n > 1) \Rightarrow \texttt{const} \ n - 1 \in S\_{v\_4}, \\ & \texttt{stoch} \in S\_{f\_5} \Rightarrow \texttt{stoch} \in S\_{v\_4}, \\ & \texttt{const} \\_\\_\in S\_{f\_5} \Rightarrow (\texttt{stoch} \in S\_{one} \Rightarrow \texttt{stoch} \in S\_{v\_4}), \\ & \quad \text{undigned}\_{v\_4} \Rightarrow (\forall y \ \lambda y. \\_\in S\_{f\_5} \Rightarrow \texttt{undigned}\_y), \\ & \texttt{stoch} \in S\_{f\_5} \Rightarrow (\forall y \ \lambda y. \\_\in S\_{lhs} \Rightarrow \texttt{undigned}\_y) \} \end{aligned} \tag{3}$$

The first constraint is standard: if an abstract value λz.y flows to f5, the abstract values of one (the right-hand side) flow to z. Furthermore, the result of the application, given by the body name y, must flow to the result v<sup>4</sup> of the application. The second constraint is also relatively standard: if an intrinsic function of arity n is applied, it produces a const of arity n − 1. The other constraints are new and specific for stochastic values and unalignment. The third constraint states that if the function is stochastic, the result is stochastic. The fourth constraint states that if we apply an intrinsic function to a stochastic argument, the result is stochastic. We could also make the analysis of intrinsic applications less conservative through intrinsic-specific constraints. The fifth and sixth constraints state that if the expression (labeled by v4) is unaligned or the function is stochastic, all abstract lambdas that flow to the function are unaligned.

The if resulting in f<sup>5</sup> at line 14 in Fig. 4 generates (omitting the recursively generated constraints for the branches t<sup>t</sup> and te) the constraints

$$\begin{aligned} \{ S\_{\text{NAME}(f\_2)} \subseteq S\_{f\_5}, S\_{\text{NAME}(f\_3)} \subseteq S\_{f\_5}, \texttt{stack} \in S\_{a\_1} \Rightarrow \texttt{stack} \in S\_{f\_5} \} \\ \cup \{ un aligned\_{f\_5} \Rightarrow un aligned\_{t\_5} \} \cup \{ \texttt{stack} \in S\_{a\_1} \Rightarrow un aligned\_{t\_5} \} \end{aligned} \tag{4}$$

The first two constraints are standard and state that the result of the branches flows to the result of the if expression. The remaining constraints are new. The third constraint states that if the condition is stochastic, the result is stochastic. The last two constraints state that if the if is unaligned or if the condition is stochastic, all names in the branches (here, only t5) are unaligned.

Given constraints for a program, we need to compute a solution satisfying all constraints. We do this by repeatedly iterating through all the constraints and propagating abstract values accordingly. We terminate when we reach a fixed point, i.e., when no constraint results in an update of either S<sup>x</sup> or unaligned<sup>x</sup> for any x in the program. We extend the 0-CFA constraint propagation algorithm to also handle the constraints generated for tracking stochastic values and unalignment.† Specifically, the algorithm is a function analyzeAlign: TANF → ((X → P(A)) × P(X)) that returns a map associating each variable to a set of abstract values and a set of unaligned variables. In other words, analyzeAlign computes a solution to S<sup>x</sup> and unaligned<sup>x</sup> for each x in the analyzed program. For example, analyzeAlign(texample ) results in

$$\begin{aligned} S\_{n\_1} &= \{\text{const}\ 1\} \ S\_{n\_2} = \{\text{const}\ 1\} \ S\_{f\_1} = \{\lambda x\_1.x\_1\} \ S\_{f\_2} = \{\lambda x\_2.t\_2\} \\ S\_{f\_3} &= \{\lambda x\_3.t\_3\} \ S\_{f\_4} = \{\lambda x\_4.t\_4\} \ S\_{a\_1} = \{\mathbf{st}\mathbf{c}\} \ S\_{v\_2} = \{\mathbf{st}\mathbf{c}\} \\ S\_{f\_5} &= \{\lambda x\_2.t\_2, \lambda x\_3.t\_3, \mathbf{st}\mathbf{c}\} \ S\_{v\_4} = \{\mathbf{st}\mathbf{c}\} \ S\_n = \mathcal{Q} \mid \text{other } n \in X \\ \text{undigend}\_n &= \text{true} \mid n \in \{t\_2, t\_3, t\_4, t\_5\} \text{ undigend}\_n = \text{false} \mid \text{other } n \in X. \end{aligned} \tag{5}$$

The example confirms our earlier intuition: an intrinsic (¬) flows to n1, stoch flows to a1, f<sup>5</sup> is stochastic and originates at either (λx2.t2) or (λx3.t3), and the unaligned variables are t2, t3, t4, and t5. We now give soundness results.

Lemma 1 (0-CFA soundness). For every t ∈ TANF, the solution produced by analyzeAlign(t) satisfies the constraints generateConstraints(t).

Proof. The well-known soundness of 0-CFA extends to the new alignment constraints. See, e.g., Nielson et al. [34, Chapter 3] and Shivers [42]. ut Theorem 1 (Alignment analysis soundness). Assume t ∈ TANF, A<sup>t</sup> from Definition 6, and an assignment to S<sup>x</sup> and unaligned<sup>x</sup> for x ∈ X according to analyzeAlign(t). Let <sup>A</sup>b<sup>t</sup> <sup>=</sup> {<sup>x</sup> | ¬unalignedx} and take arbitrary <sup>∅</sup> ` t <sup>s</sup><sup>1</sup> ⇓ w<sup>1</sup> l1 v<sup>1</sup> and ∅ ` t <sup>s</sup><sup>2</sup> ⇓ w<sup>2</sup> l2 <sup>v</sup>2. Then, <sup>l</sup>1|<sup>A</sup>b<sup>t</sup> <sup>=</sup> <sup>l</sup>2|<sup>A</sup>b<sup>t</sup> and consequently <sup>A</sup>b<sup>t</sup> <sup>⊆</sup> <sup>A</sup>t.

The proof† uses simultaneous structural induction over the derivations ∅ ` t <sup>s</sup><sup>1</sup> ⇓ w<sup>1</sup> l1 v<sup>1</sup> and ∅ ` t <sup>s</sup><sup>2</sup> ⇓ w<sup>2</sup> l2 v2. At corresponding stochastic branches or stochastic function applications in the two derivations, a separate structural induction argument shows that, for the let-sequences l 0 <sup>1</sup> and l 0 <sup>2</sup> of the two stochastic subderivations, l 0 1 |Abt = l 0 2 |Abt = []. Combined, the two arguments give the result.

The result <sup>A</sup>b<sup>t</sup> <sup>⊆</sup> <sup>A</sup><sup>t</sup> (cf. Definition 6) shows that the analysis is conservative.

#### 4.3 Dynamic Alignment

An alternative to static alignment is dynamic alignment, which we explored in early stages when developing the alignment analysis. Dynamic alignment is fully context sensitive and amounts to introducing variables in programs that track (at runtime) when evaluation enters stochastic branching. To identify these stochastic branches, dynamic alignment also requires a runtime data structure that keeps track of the stochastic values. Similarly to k-CFA, dynamic alignment is potentially more precise than the 0-CFA approach. However, we discovered that dynamic alignment introduces significant runtime overhead. Again, we note that the models in Section 7 do not require a context-sensitive analysis, justifying the choice of 0-CFA over dynamic alignment and k-CFA.

# 5 Aligned SMC and MCMC

This section presents detailed algorithms for aligned SMC (Section 5.1) and aligned lightweight MCMC (Section 5.2). For a more pedagogical introduction to the algorithms, see Section 2. We assume a basic understanding of SMC and Metropolis–Hastings MCMC algorithms (see, e.g., Bishop [4]).

#### 5.1 Aligned SMC

We saw in Section 2.1 that SMC operates by executing many instances of t concurrently, and resampling them at calls to weight. Critically, resampling requires that the inference algorithm can both suspend and resume executions. Here, we assume that we can create execution instances e of the probabilistic program t, and that we can arbitrarily suspend and resume the instances. The technical details of suspension are beyond the scope of this paper. See Goodman and Stuhlmüller [14], Wood et al. [48], and Lundén et al. [25] for further details.

Algorithm 2 presents all steps for the aligned SMC inference algorithm. After running the alignment analysis and setting up the n execution instances, the algorithm iteratively executes and resamples the instances. Note that the algorithm resamples only at aligned weights (see Section 2.1).

Algorithm 2 Aligned SMC. The input is a program t ∈ TANF and the number of execution instances n.


```
1 if assume Bernoulli(0.5) then
   2 weight 1; weight 10; true
   3 else
   4 weight 10; weight 1; false
(a) Aligned better than unaligned.
                                              1 if assume Bernoulli(0.1) then
                                              2 weight 9;
                                              3 if assume Bernoulli(0.5)
                                              4 then weight 1.5 else weight 0.5;
                                              5 true
                                              6 else (weight 1; false)
                                              (b) Unaligned better than aligned.
```
Fig. 5: Programs illustrating properties of aligned and unaligned SMC. Fig. (a) shows a program better suited for aligned SMC. Fig. (b) shows a program better suited for unaligned SMC.

We conjecture that aligned SMC is preferable over unaligned SMC for all practically relevant models, as the evaluation in Section 7 justifies. However, it is possible to construct contrived programs in which unaligned SMC has the advantage. Consider the programs in Fig. 5, both encoding Bernoulli(0.5) distributions in a contrived way using weights. Fig. 5a takes one of two branches with equal probability. Unaligned SMC resamples at the first weights in each branch, while aligned SMC does not because the branch is stochastic. Due to the difference in likelihood, many more else executions survive resampling compared to then executions. However, due to the final weights in each branch, the branch likelihoods even out. That is, resampling at the first weights is detrimental, and unaligned SMC performs worse than aligned SMC. Fig. 5b also takes one of two branches, but now with unequal probabilities. However, the two branches still have equal posterior probability due to the weights. The nested if in the then branch does not modify the overall branch likelihood, but adds variance. Aligned SMC does not resample for any weight within the branches, as the branch is stochastic. Consequently, only 10% of the executions in aligned SMC take the then branch, while half of the executions take the then branch in unaligned SMC (after resampling at the first weight). Therefore, unaligned SMC better explores the then branch and reduces the variance due to the nested if, which results in overall better inference accuracy. We are not aware of any real model with the property in Fig. 5b. In practice, it seems best to always resample when using weight to condition on observed data. Such conditioning is, in practice, always done outside of stochastic branches, justifying the benefit of aligned SMC.

Algorithm 3 Aligned lightweight MCMC. The input is a program t ∈ TANF, the number of steps n, and the global step probability g > 0.


function run() = Run t and do the following:

	- 1. If reuse = false, global = true, n 0 <sup>i</sup>−1,k,l 6= c, or if s 0 <sup>i</sup>−1,k,l does not exist, sample a value x from d and set reuse ← false. Otherwise, reuse the sample x = s 0 <sup>i</sup>−1,k,l and set w 0 <sup>−</sup><sup>1</sup> ← w 0 −1 · p 0 <sup>i</sup>−1,k,l and w <sup>0</sup> ← w 0 · fd(c).
	- 2. Set s 0 i,k,l ← x, p 0 i,k,l ← fd(x), and n 0 i,k,l ← c.
	- 3. Set l ← l + 1. In the program, bind c to the value x and resume execution.
	- 1. If j = k, global = true, or if si−1,k does not exist, sample a value x from d normally. Otherwise, reuse the sample x = si−1,k. Set w 0 <sup>−</sup><sup>1</sup> ← w 0 −1 · pi−1,k and w <sup>0</sup> ← w 0 · fd(x).
	- 2. Set si,k ← x and pi,k ← fd(x).
	- 3. Set k ← k + 1, l ← 1, and reuse ← true. In the program, bind c to the value x and resume execution.

#### 5.2 Aligned Lightweight MCMC

Aligned lightweight MCMC is a version of lightweight MCMC [47], where the alignment analysis provides information about how to reuse random draws between executions. Algorithm 3, a Metropolis–Hastings algorithm in the context of PPLs, presents the details. Essentially, the algorithm executes the program repeatedly using the Run function, and redraws one aligned random draw in each step, while reusing all other aligned draws and as many unaligned draws as possible (illustrated in Section 2.2). It is possible to formally derive the Metropolis– Hastings acceptance ratio in step 5. † A key property in Algorithm 3 due to alignment (Definition 6) is that the length of s<sup>i</sup> (and pi) is constant, as executing t always results in the same number of aligned random draws.

In addition to redrawing only one aligned random draw, each step has a probability g > 0 of being global—meaning that inference redraws every random draw in the program. Occasional global steps fix problems related to slow mixing and ergodicity of lightweight MCMC identified by Kiselyov [21]. In a global step, the Metropolis–Hastings acceptance ratio reduces to A = min 1, w<sup>i</sup> wi−<sup>1</sup> .

# 6 Implementation

We implement the alignment analysis (Section 4), aligned SMC (Section 5.1), and aligned lightweight MCMC (Section 5.2) for the functional PPL Miking CorePPL [25], implemented as part of the Miking framework [7]. We implement the alignment analysis as a core component in the Miking CorePPL compiler, and then use the analysis when compiling to two Miking CorePPL backends: RootPPL and Miking Core. RootPPL is a low-level PPL with built-in highly efficient SMC inference [25], and we extend the CorePPL to RootPPL compiler introduced by Lundén et al. [25] to support aligned SMC inference. Furthermore, we implement aligned lightweight MCMC inference standalone as a translation from Miking CorePPL to Miking Core. Miking Core is the general-purpose programming language of the Miking framework, currently compiling to OCaml.

The idealized calculus in (1) does not capture all features of Miking CorePPL. In particular, the alignment analysis implementation must support records, variants, sequences, and pattern matching over these. Extending 0-CFA to such language features is not new, but it does introduce a critical challenge for the alignment analysis: identifying all possible stochastic branches. Determining stochastic ifs is straightforward, as we simply check if stoch flows to the condition. However, complications arise when we add a match construct (and, in general, any type of branching construct). Consider the extension

$$\begin{array}{lcl} \mathtt{t} ::= \ldots & | \mathtt{match} \ \mathtt{t} \ \mathtt{with} \ \mathtt{p} \ \mathtt{then} \ \mathtt{t} \ \mathtt{else} \ \mathtt{t} \ | \ \{ k\_{1} = x\_{1}, \ldots, k\_{n} = x\_{n} \} \\\ \mathtt{p} ::= \boldsymbol{x} \mid \text{true} \ | \ \mathtt{false} \ | \ \{ k\_{1} = \mathtt{p}, \ldots, k\_{n} = \mathtt{p} \} \\\ \{ x, x\_{1}, \ldots, x\_{n} \in X \quad k\_{1}, \ldots, k\_{n} \in K \quad n \in \mathbb{N} \end{array} \tag{6}$$

of (1), adding records and simple pattern matching. K is a set of record keys. Assume we also extend the abstract values as a ::= . . . | {k<sup>1</sup> = X1, . . . , k<sup>n</sup> = Xn}, where X1, . . . , X<sup>n</sup> ⊆ X. That is, we add an abstract record tracking the names in the program that flow to its entries. Consider the program match t<sup>1</sup> with { a = x1, b = false } then t<sup>2</sup> else t3. This match is, similar to ifs, stochastic if stoch ∈ St<sup>1</sup> . It is also, however, stochastic in other cases. Assume we have two program variables, x and y, such that stoch ∈ S<sup>x</sup> and stoch 6∈ Sy. Now, the match is stochastic if, e.g., {a = {y}, b = {x}} ∈ St<sup>1</sup> , because the random value flowing from x to the pattern false may not match because of randomness. However, it is not stochastic if, instead, S<sup>t</sup><sup>1</sup> = {{a = {x}, b = {y}}}. The randomness of x does not influence whether or not the branch is stochastic—the variable pattern x<sup>1</sup> for label a always matches.

Our alignment analysis implementation handles the intricacies of identifying stochastic match cases for nested record, variant, and sequence patterns. In total, the alignment analysis, aligned SMC, and aligned lightweight MCMC implementations consist of approximately 1000 lines of code directly contributed as part of this paper. The code is available on GitHub [2].

# 7 Evaluation

This section evaluates aligned SMC and aligned lightweight MCMC on a set of models encoded in Miking CorePPL: CRBD [33,39] in Sections 7.1 and 7.5, ClaDS [28,39] in Section 7.2, state-space aircraft localization in Section 7.3, and latent Dirichlet allocation in Section 7.4. CRBD and ClaDS are non-trivial models of considerable interest in evolutionary biology and phylogenetics [39]. Similarly, LDA is a non-trivial topic model [5]. Running the alignment analysis took approximately 5 ms–30 ms for all models considered in the experiment, justifying that the time complexity is not a problem in practice.

We compare aligned SMC with standard unaligned SMC [14], which is identical to Algorithm 2, except that it resamples at every call to weight. † We carefully checked that automatic alignment corresponds to previous manual alignments of each model. For all SMC experiments, we estimate the normalizing constant produced as a by-product of SMC inference rather than the complete posterior distributions. The normalizing constant, also known as marginal likelihood or model evidence, frequently appears in Bayesian inference and gives the probability of the observed data averaged over the prior. The normalizing constant is useful for model comparison as it measures how well different probabilistic models fit the data (a larger normalizing constant indicates a better fit).

We ran aligned and unaligned SMC with Miking CorePPL and the RootPPL backend configured for a single-core (compiled with GCC 7.5.0). Lundén et al. [25] shows that the RootPPL backend is significantly more efficient than other state-of-the-art PPL SMC implementations. We ran aligned and unaligned SMC inference 300 times (and with 3 warmup runs) for each experiment for 10<sup>4</sup> , 10<sup>5</sup> , and 10<sup>6</sup> executions (also known as particles in SMC literature).

We compare aligned lightweight MCMC to lightweight MCMC.† We implement both versions as compilers from Miking CorePPL to Miking Core, which in turn compiles to OCaml (version 4.12). The lightweight MCMC databases are functional-style maps from the OCaml Map library. We set the global step probability to 0.1 for both aligned lightweight MCMC and lightweight MCMC. We ran aligned lightweight and lightweight MCMC inference 300 times for each experiment. We burned 10% of samples in all MCMC runs.

For all experiments, we used an Intel Xeon 656 Gold 6136 CPU (12 cores) and 64 GB of memory running Ubuntu 18.04.5.

#### 7.1 SMC: Constant Rate Birth-Death (CRBD)

This experiment considers the CRBD diversification model from [39] applied to the Alcedinidae phylogeny (Kingfisher birds, 54 extant species) [19]. We use fixed diversification rates to simplify the model, as unaligned SMC inference accuracy is too poor for the full model with priors over diversification rates. Aligned SMC is accurate for both the full and simplified models. The source code consists of 130 lines of code.† The total experiment execution time was 16 hours.

Fig. 6 presents the experiment results. Aligned SMC is roughly twice as fast and produces superior estimates of the normalizing constant. Unaligned SMC has not yet converged to the correct value −304.75 (available for this particular model due to the fixing the diversification rates) for 10<sup>6</sup> particles, while aligned SMC produces precise estimates already at 10<sup>4</sup> particles. Excess resampling is a significant factor in the increase in execution time for unaligned SMC, as each execution encounters far more resampling checkpoints than in aligned SMC.

Fig. 6: SMC experiment results for CRBD. The x-axes give the number of particles. Fig. (a) shows execution times (in seconds) for aligned (gray) and unaligned (white) SMC. Error bars show one standard deviation. Fig. (b) shows box plot log normalizing constant estimates for aligned (gray) and unaligned (white) SMC. The analytically computed log normalizing constant is −304.75.

Fig. 7: SMC experiment results for ClaDS. The x-axes give the number of particles. Fig. (a) shows execution times (in seconds) for aligned (gray) and unaligned (white) SMC. Error bars show one standard deviation. Fig. (b) shows box plot log normalizing constant estimates for aligned (gray) and unaligned (white) SMC.

#### 7.2 SMC: Cladogenetic Diversification Rate Shift (ClaDS)

The average estimate for aligned SMC with 10<sup>6</sup> particles is −314.35.

A limitation of CRBD is that the diversification rates are constant. ClaDS [28,39] is a set of diversification models that allow shifting rates over phylogenies. We evaluate the ClaDS2 model for the Alcedinidae phylogeny. As in CRBD, we use fixed (initial) diversification rates to simplify the model on account of unaligned SMC. The source code consists of 147 lines of code.† Automatic alignment simplifies the ClaDS2 model significantly, as manual alignment requires collecting and passing weights around in unaligned parts of the program, which are later consumed by aligned weights. The total experiment execution time was 67 hours.

Fig. 7 presents the experiment results. 12 unaligned runs for 10<sup>6</sup> particles and nine runs for 10<sup>5</sup> particles ran out of the preallocated stack memory for each particle (10 kB). We omit these runs from Fig. 7. The consequence of not aligning SMC is more severe than for CRBD. Aligned SMC is now almost seven times faster than unaligned SMC and the unaligned SMC normalizing constant estimates are significantly worse compared to the aligned SMC estimates. The unaligned SMC estimates do not even improve when moving from 10<sup>4</sup> to 10<sup>6</sup> particles (we need even more particles to see improvements). Again, aligned SMC produces precise estimates already at 10<sup>4</sup> particles.

Fig. 8: SMC experiment results for the state-space aircraft localization model. The x-axes give the number of particles. Fig. (a) shows execution times (in seconds) for aligned (gray) and unaligned (white) SMC. Error bars show one standard deviation. Fig. (b) shows box plot log normalizing constant estimates on the y-axis for aligned (gray) and unaligned (white) SMC. The average estimate for aligned SMC with 10<sup>6</sup> particles is −61.26.

#### 7.3 SMC: State-Space Aircraft Localization

This experiment considers an artificial but non-trivial state-space model for aircraft localization. The source code consists of 62 lines of code.† The total experiment execution time was 1 hour.

Fig. 8 presents the experiment results. The execution time difference is not as significant as for CRBD and ClaDS. However, the unaligned SMC normalizing constant estimates are again much less precise. Aligned SMC is accurate (centered at approximately −61.26) already at 10<sup>4</sup> particles. The model's straightforward control flow explains the less dramatic difference in execution time—there are at most ten unaligned likelihood updates in the aircraft model, while the number is, in theory, unbounded for CRBD and ClaDS. Therefore, the cost of extra resampling compared to aligned SMC is not as significant.

#### 7.4 MCMC: Latent Dirichlet Allocation (LDA)

This experiment considers latent Dirichlet allocation (LDA), a topic model used in the evaluations by Wingate et al. [47] and Ritchie et al. [38]. We use a synthetic data set, comparable in size to the data set used by Ritchie et al. [38], with a vocabulary of 100 words, 10 topics, and 25 documents each containing 30 words. Note that we are not using methods based on collapsed Gibbs sampling [17], and the inference task is therefore computationally challenging even with a rather small number of words and documents. The source code consists of 31 lines of code.† The total experiment execution time was 41 hours.

The LDA model consists of only aligned random draws. As a consequence, aligned lightweight and lightweight MCMC reduces to the same inference algorithm, and we can compare the algorithms by just considering the execution times. The experiment also justifies the correctness of both algorithms.†

Fig. 9 presents the experiment results. Aligned lightweight MCMC is almost three times faster than lightweight MCMC. To justify the execution times with our implementations, we also implemented and ran the experiment with

Fig. 9: MCMC experiment results for LDA showing execution time (in seconds) for aligned lightweight MCMC (gray) and lightweight MCMC (white). Error bars show one standard deviation and the x-axis the number of MCMC iterations.

lightweight MCMC in WebPPL [14] for 10<sup>5</sup> iterations, repeated 50 times (and with 3 warmup runs). The mean execution time was 383 s with standard deviation 5 s. We used WebPPL version 0.9.15 and Node version 16.18.0.

#### 7.5 MCMC: Constant Rate Birth-Death (CRBD)

This experiment again considers CRBD. MCMC is not as suitable for CRBD as SMC, and therefore we use a simple synthetic phylogeny with six leaves and an age span of 5 age units (Alcedinidae used for the SMC experiment has 54 leaves and an age span of 35 age units). The source code for the complete model is the same as in Section 7.1, but we now allow the use of proper prior distributions for the diversification rates. The total experiment execution time was 7 hours.

Unlike LDA, the CRBD model contains both unaligned and aligned random draws. Because of this, aligned lightweight MCMC and standard lightweight MCMC do not reduce to the same algorithm. To judge the difference in inference accuracy, we consider the mean estimates of the birth diversification rate produced by the two algorithms, in addition to execution times. The experiment results shows that the posterior distribution over the birth rate is unimodal† , which motivates using the posterior mean as a measure of accuracy.

Fig. 10 presents the experiment results. Aligned lightweight MCMC is approximately 3.5 times faster than lightweight MCMC. There is no obvious difference in accuracy. To justify the execution times and correctness of our implementations, we also implemented and ran the experiment with lightweight MCMC in WebPPL [14] for 3 · 10<sup>6</sup> iterations, repeated 50 times (and with 3 warmup runs). The mean estimates agreed with Fig. 10. The mean execution time was 37.1 s with standard deviation 0.8 s. The speedup compared to standard lightweight MCMC in Miking CorePPL is likely explained by the use of early termination in WebPPL, which benefits CRBD. Early termination easily combines with alignment but relies on execution suspension, which we do not currently use in our implementations. Note that aligned lightweight MCMC is faster than WebPPL even without early termination.

In conclusion, the experiments clearly demonstrate the need for alignment.

Fig. 10: MCMC experiment results for CRBD. The x-axes give the number of iterations. Fig. (a) shows execution times (in seconds) for aligned lightweight MCMC (gray) and lightweight MCMC (white). Error bars show one standard deviation. Fig. (b) shows box plot posterior mean estimates of the birth rate for aligned lightweight MCMC (gray) and lightweight MCMC (white). The average estimate for aligned lightweight MCMC with 3 · 10<sup>6</sup> iterations is 0.33.

# 8 Related Work

The approach by Wingate et al. [47] is closely related to ours. A key similarity with alignment is that executions reaching the same aligned checkpoint also have matching stack traces according to Wingate et al.'s addressing transform. However, Wingate et al. do not consider the separation between unaligned and aligned parts of the program, their approach is not static, and they do not generalize to other inference algorithms such as SMC.

Ronquist et al. [39], Turing [12], Anglican [48], Paige and Wood [36], and van de Meent et al. [46] consider the alignment problem. Manual alignment is critical for the models in Ronquist et al. [39] to make SMC inference tractable, which strongly motivates the automatic alignment approach. The documentation of Turing states that: "The observe statements [i.e., likelihood updates] should be arranged so that every possible run traverses all of them in exactly the same order. This is equivalent to demanding that they are not placed inside stochastic control flow" [1]. Turing does not include any automatic checks for this property. Anglican [48] checks, at runtime (resulting in overhead), that all SMC executions encounter the same number of likelihood updates, and thus resamples the same number of times. If not, Anglican reports an error: "some observe directives [i.e., likelihood updates] are not global". This error refers to the alignment problem, but the documentation does not explain it further. Probabilistic C, introduced by Paige and Wood [36], similarly assumes that the number of likelihood updates is the same in all executions. Van de Meent et al. [46] state, in reference to SMC: "Each breakpoint [i.e., checkpoint] needs to occur at an expression that is evaluated in every execution of a program". Again, they do not provide any formal definition of alignment nor an automatic solution to enforce it.

Lundén et al. [24] briefly mention the general problem of selecting optimal resampling locations in PPLs for SMC but do not consider the alignment problem in particular. They also acknowledge the overhead resulting from not all SMC executions resampling the same number of times, which alignment avoids.

The PPLs Birch [31], Pyro [3], and WebPPL [14] support SMC inference. Birch and Pyro enforce alignment for SMC as part of model construction. Note that this is only true for SMC in Pyro—other Pyro inference algorithms use other modeling approaches. The approaches in Birch and Pyro are sound but demand more of their users compared to the alignment approach. WebPPL does not consider alignment and resamples at all likelihood updates for SMC.

Ritchie et al. [38] and Nori et al. [35] present MCMC algorithms for probabilistic programs. Ritchie et al. [38] optimize lightweight MCMC by Wingate et al. [47] through execution suspensions and callsite caching. The optimizations are independent of and potentially combines well with aligned lightweight MCMC. Another MCMC optimization which potentially combines well with alignment is due to Nori et al. [35]. They use static analysis to propagate observations backwards in programs to improve inference.

Information flow analyses [40] may determine if particular parts of a program execute as a result of different program inputs. Specifically, if program input is random, such approaches have clear similarities to the alignment analysis.

Many other PPLs exist, such as Gen [10], Venture [29], Edward [44], Stan [8], and AugurV2 [18]. Gen, Venture, and Edward focus on simplifying the joint specification of a model and its inference to give users low-level control, and do not consider automatic alignment specifically. However, the incremental inference approach [9] in Gen does use the addressing approach by Wingate et al. [47]. Stan and AugurV2 have less expressive modeling languages to allow more powerful inference. Alignment is by construction due to the reduced expressiveness.

Borgström et al. [6], Staton et al. [43], Ścibior et al. [41], and Vákár et al. [45] treat semantics and correctness for PPLs, but do not consider alignment.

# 9 Conclusion

This paper gives, for the first time, a formal definition of alignment in PPLs. Furthermore, we introduce a static analysis technique and use it to align checkpoints in PPLs and apply it to SMC and MCMC inference. We formalize the alignment analysis, prove its correctness, and implement it in Miking CorePPL. We also implement aligned SMC and aligned lightweight MCMC, and evaluate the implementations on non-trivial CRBD and ClaDS models from phylogenetics, the LDA topic model, and a state-space model, demonstrating significant improvements compared to standard SMC and lightweight MCMC.

Acknowledgments We thank Lawrence Murray, Johannes Borgström, and Jan Kudlicka for early discussions on the alignment idea, and Viktor Senderov for implementing ClaDS in Miking CorePPL. We also thank the anonymous reviewers at ESOP for their valuable comments.

# References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Author Index**

#### **A**

Allais, Guillaume 113 Amorim, Arthur Azevedo de 479 Ascari, Flavio 1

#### **B**

Baillot, Patrick 479 Bardin, Sébastien 59 Broman, David 535 Bruni, Roberto 1

#### **C**

Caires, Luís 421 Çaylak, Gizem 535 Cirisci, Berk 337 Costa, Diana 392

#### **D**

Dandy, Liliane-Joy 507 Dardha, Ornela 363 de Vilhena, Paulo Emílio 225 Derakhshan, Farzaneh 168 Dotzel, Myra 168 Ducousso, Soline 59

#### **E**

Enea, Constantin 337 Erhard, Julian 28

#### **G**

Gaboardi, Marco 479 Gori, Roberta 1

#### **H**

Han, Yo-Sub 90 Hattori, Momoko 197

#### **I**

Igarashi, Atsushi 281 Im, Hyeonseung 90

#### **J**

Jeandel, Emmanuel 507 Jia, Limin 168

#### **K**

Kappé, Tobias 309 Khajwal, Basim 479 Kim, Su-Hyeon 90 Kim, Youngwook 90 Knapp, Alexander 253 Ko, Sang-Ki 90 Kobayashi, Naoki 197

#### **L**

Le Brun, Matthew Alan 363 Lundén, Daniel 535

#### **M**

Mordido, Andreia 392 Mühlberger, Heribert 253 Murase, Yuito 281 Mutluergil, Suha Orhun 337

**N** Nishiwaki, Yuichi 281

#### **O**

Oliveira, Bruno C. d. S. 140 Ong, C.-H. Luke 479

#### **P**

Poças, Diogo 392 Potet, Marie-Laure 59 Pottier, François 225

© The Editor(s) (if applicable) and The Author(s) 2023 T. Wies (Ed.): ESOP 2023, LNCS 13990, pp. 565–566, 2023. https://doi.org/10.1007/978-3-031-30044-8

#### **R**

Reus, Bernhard 253 Rocha, Pedro 421 Ronquist, Fredrik 535

#### **S**

Saan, Simmo 28 Sato, Ryosuke 197 Schmid, Todd 309 Schwarz, Michael 28 Seidl, Helmut 28 Silva, Alexandra 309 Surbatovich, Milijana 168 **V** Vasconcelos, Vasco T. 392 Vojdani, Vesal 28

#### **W**

Wagner, Dominik 479 Wunder, June 479

#### **Y**

Ye, Wenjia 140

#### **Z**

Zamdzhiev, Vladimir 507